Human speech processing inspired automatic speech recognition

Automatic speech recognition (ASR) systems are used extensively in, e.g., mobile phones and laptops, for a range of different tasks, and work well in restricted settings (e.g., few different speakers, quiet background) but tend to break down when the speech and listening conditions are highly diverse (e.g., many different speakers or the presence of background noise 
or a speech pathology) or when limited data is available for the language for which the system is build (i.e., low-resource languages), which makes their integration into human-machine and human-robot interaction systems problematic.

Human, native listeners of a language are the optimal speech recognisers. The central questions addressed in this research theme are:

  1. Why are human listeners so much better at recognising speech than computers?
  2. Can we use knowledge about human speech processing to improve ASR systems?

These questions are investigated using different types of research techniques, including machine learning (primarily deep learning) and human listening experiments (including EEG).

The research in this research theme focuses on, but is not limited to:

  • Building speech technology for under-resourced languages and languages without a common written language.
  • Visualisations of the speech representations in deep neural networks.
  • Using knowledge about human speech processing to build help improve ASR algorithms.
  • Systematic comparisons between human and automatic speech processing architectures and performance.
  • Building computational models of human speech processing using techniques from ASR.

Coordinator: Odette Scharenborg