SpeechRecognition

Please check the status of this specification in Launchpad before editing it. If it is Approved, contact the Assignee or another knowledgeable person before making changes.

Summary

A roadmap for providing speech recognition on Ubuntu (an informational spec).

Release Note

[speech recognition will be a long project; this release notes blurb describes the first component likely to land]. A speech recognition utility lets you control your computer with simple commands like 'Open Firefox'. A new user interface utilises existing voice recognition engines like Sphinx.

Rationale

Robust speech recognition will be useful for many groups for both dictation and navigation. There are currently no workable solutions available on Linux.

Use Cases

  • Professionals who perform dictation
  • Non-lating language input
  • Mobility impaired
  • Sufferers of RSI
  • Domotic and remote control
  • User Authentication?

Assumptions

Making a start on certain key bits of the stack will act as a catalyst to move development forward in the more technical aspecs.

Design

Front end

Technically, the front end is the easiest part of the puzzle, and traditionally this would be left for the end. However there are good reasons for developing a good GUI early in this case as it can act as a catalyst for the more low-level work. See: /GUI, /SpeechMaker.

Speech recognition engines

Teams like Julius and Sphinx are working on open source solutions, but are largely held back by the lack of good free voice models, which in turn requires a large body of free, high quality voice data. The VoxForge project has been set up to provide this through community contributions, but the project needs a larger volunteer base and better end-user tools.

The front end should provide a simple way to record voice data and submit it to the VoxForge site directly. This will facilitate a distributed effort to improve recognition results. I should also be able to work with proprietary engines to be more immediately useful, speeding up general uptake of speech recognition on Linux.

Implementation

The Google Summer of Code project Gnome Voice Control created a UI for a simple command utility. It performs tasks like controlling windows, starting programs, moving through menus and simple text manipulation.

Test/Demo Plan

Comments

Moved to /Comments

SpeechRecognition (last edited 2011-03-19 15:16:56 by D9784B24)