We interview long-time collaboration partner and professor at Tampere University of Technology, Tuomas Virtanen.
When it comes to innovation and technological advancement, we certainly live in interesting times. Computers are becoming exponentially more powerful every decade, and the field of machine learning and neural networks is no exception. Last week, the professor of audio-signal processing at the Tampere University of Technology, and a long-time collaboration partner of Eriksholm, Tuomas Virtanen, visited us to share his expertise and visions for the future of machine learning in the hearing care industry.
Tuomas Virtanen spearheads several interesting research projects. The largest ongoing project within the group revolves around source separation in sound, i.e. separating specific voices or sound sources from a complex recording. They also work on content analysis of sound, such as the automatic identification of sound events (a baby crying, glass breaking, etc.) and the recognition of environmental sound scenes such as traffic, nature and city ambience.
We at Eriksholm Research Centre have collaborated with Tuomas and his research group since 2012. We worked as partners on the European Inspire project, and one of Tuomas’ students, Tom Barker, visited Eriksholm in 2014 to learn more about the potential of source separation in hearing aids. Since then, our research units have been working closely together on cracking this particular challenge. We apply the speech separation algorithms to separate competing voices, thus allowing the listener to easier segregate (hear out) the individual voices and focus attention on the most interesting voice at any time.
What is the point of speech separation?
“There are many good reasons. You can do it for human listeners to improve speech intelligibility, for example. It is also useful as a part of computational analysis methods. For instance, when you are trying to recognize speech automatically, any interfering sounds will make the task more difficult.“
What is a deep neural network?
“A neural network (or artificial neural network) is a set of processing nodes, which take some input values, do some simple operations, and convert that to some output. When you stack multiple layers of those – often many, many layers - then you can do operations with more complex structures of input and these layers of neurons can learn to do many very difficult or resource-intense tasks automatically, such as analyzing and recognizing specific sound sources. That is a deep neural network.”
Tuomas’ work with neural networks began in 2013, shortly before they became hugely important to research in many fields. “I thought ‘this is a technique that I should learn, and start using.’ I did. And it paid off.”
Deep neural networks (DNNs) are incredibly important to so many industries today. Why do you think that is?
“Well, DNNs have been used to get very good results in many areas, like speech separation, image classification, and so on. Therefore, they are very powerful as a supervised machine-learning tool, where you have some training data and define the targets you want to achieve.
There are so many areas where deep neural networks can be applied. In addition to classification-related tasks (like identifying a sound or analyzing an image), there is also speech synthesis, music synthesis, or any problem where you have a clearly defined target, really, and can provide training data to the network.”
Anyone who uses a smartphone that might recognize a piece of music in a store, identify the subject in a picture, or react to spoken commands and questions as for instance Apple’s Siri, is relying on deep neural networks.
What is the potential for deep neural networks in speech separation?
“There is very good potential. They have been used to get a much better separation quality compared to previous methods. So far, the use of deep neural networks for speech separation has mainly been focused on controlled lab experiments, where we mix many different sounds and evaluate the performance. Therefore, I think there are lot of interesting challenges when we move to real-world conditions. Then there are factors that we cannot control directly. We will have to find new ways to ensure the quality and diversity of our data.”
How could this technology be used in a hearing aid?
“There is great potential for using deep neural networks actively in hearing aids. They are computationally quite cheap, and very scalable. I think the way it will be used – though this is still an open question – is having a generic model for different types of sounds, and ways to recognize them. Of course, the user may want to adapt those models to their particular needs and the sounds they encounter in their daily lives, so we have some ways to go before the technology is mature enough for that."