Opening up voice technology for all

Warm up your vocal chords, because typing will soon seem very old fashioned.

Michael Henretty. Photo by Kasia Odrozek (CC BY-SA 4.0)

Gradually, machines are being trained to recognize different languages, words and accents. As voice recognition technology improves, more computers and devices will be “listening” for your inputs and commands, and even understanding what you want.

At risk of being left behind in this new technological shift are small scale software developers without access to giant voice recognition databases – like those of Amazon, Microsoft or Apple – and Internet users worldwide who speak minority languages or dialects.

Mozilla’s Common Voice project is one of only few efforts to create a large, open and publicly available voice dataset that anyone can download and use freely.

By pairing hundreds of hours of spoken audio recordings with written words, the dataset can teach computers to understand voices. To support this effort, thousands of people have donated their own voice recordings to Common Voice through a simple Web interface.

This year, Common Voice is expanding to include languages beyond English. Michael Henretty, a digital strategist working with Mozilla’s open innovation team, is optimistic about the future of open-source voice technology, but sees more work to be done.

Q: How do you imagine that Common Voice data will be used in the future?

A: We are using the English voice data collection to improve Mozilla’s own speech recognition engine, project name “DeepSpeech,” and we hope to enable others to improve their open source engines as well.

Already we have seen some adoption, with popular open source projects like Kaldi integrating the data. We are also in talks with several universities to use the data for research initiatives.

But probably the most important goal of Common Voice is to bring speech technology to languages and communities where market forces will be too slow. For instance, could speech recognition be used by minority language speakers to enable more people to have access to technology and the services the Internet can provide, even if they never learned how to read?

Q: What steps is your team taking to go bigger and multilingual?

A: We are working with a great open source translation project, Tatoeba, to enable more communities to collect voice data in whatever language/dialect/accent they want. Aside from this, we are working on making our own website more fun to interact with. So far, only around 10% of people who visit the site actually donate their voice. And those who do, usually only do so once. So we are looking into how to make Common Voice more social and rewarding.

Further reading:

Common Voice by Mozilla
Google, Mozilla and the Race to Make Voice Data for Everyone, Fast Company, 2017

Opening up voice technology for all

Comments