Mozilla's open-source speech recognition

Tony Atkins tony at raisingthefloor.org
Fri Dec 1 10:20:26 UTC 2017


Hi, Alan:

I studied voice recognition quite a bit in the 90s as part of my master's,
and wanted to comment with my impressions.  In those days, the tradeoff was
between being able to recognise a very limited vocabulary for a wide range
of speakers, or training a computer to recognise a wide vocabulary from a
single speaker.  With their massive body of training data (and with newer
and faster computers), solutions like this are now really good at a lot of
limited vocabularies, and pretty good at open-ended recognition, and all
without training for the individual speaker.

In the late 90s there were systems that used speaker-independent
recognition to do things like check flight times, relying on their ability
to understand one or two specific sets of words (dates and times, places
you might fly to/from).  These days, there are still clear limits, but
systems can recognise which of dozens of contexts you're talking about, and
then understand a much deeper vocabulary within each context.

For open ended speech, they claim around 95% accuracy, which is actually
really good for speaker-independent recognition.  However, as a starting
point for things like automatically adding subtitles, 95% is still
noticeably and sometimes laughably off.  The good news is that with tools
like YouTube's subtitles editor, human reviewers can focus on transcription
errors and the timing of the subtitles, instead of also typing in the 95%
the speech-to-text engine captures successfully.  And even that 95% is
usually better than nothing.

I also love that they provide not only the specific tool, but also the
dataset they used to train it.  The same data can be used to find better
answers to this problem, but can also be used in unexpected ways, for
example, identifying and gaming an engine for specific accents.

Anyway, thanks for sharing this.

Cheers,


Tony

On 30 November 2017 at 15:03, Harnum, Alan <aharnum at ocadu.ca> wrote:

> Interesting news on this front:
>
>
>
> https://blog.mozilla.org/blog/2017/11/29/announcing-the-
> initial-release-of-mozillas-open-source-speech-
> recognition-model-and-voice-dataset/
>
>
>
> Node is one of the initial supported bindings: https://github.com/mozilla/
> DeepSpeech#using-the-nodejs-package
>
>
>
>
>
> _______________________________________________________
> fluid-work mailing list - fluid-work at lists.idrc.ocad.ca
> To unsubscribe, change settings or access archives,
> see https://lists.idrc.ocad.ca/mailman/listinfo/fluid-work
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.idrc.ocad.ca/pipermail/fluid-work/attachments/20171201/c6eec73c/attachment.htm>


More information about the fluid-work mailing list