SpeechRecognition

Please check the status of this specification in Launchpad before editing it. If it is Approved, contact the Assignee or another knowledgeable person before making changes.

Launchpad Entry: speech-recognition
Packages affected:
Also see: /GUI, /SpeechMaker

Summary

A roadmap for providing speech recognition on Ubuntu (an informational spec).

Release Note

[speech recognition will be a long project; this release notes blurb describes the first component likely to land]. A speech recognition utility lets you control your computer with simple commands like 'Open Firefox'. A new user interface utilises existing voice recognition engines like Sphinx.

Rationale

Robust speech recognition will be useful for many groups for both dictation and navigation. There are currently no workable solutions available on Linux.

Use Cases

Professionals who perform dictation
Non-lating language input
Mobility impaired
Sufferers of RSI
Domotic and remote control
User Authentication?

Assumptions

Making a start on certain key bits of the stack will act as a catalyst to move development forward in the more technical aspecs.

Design

Front end

Technically, the front end is the easiest part of the puzzle, and traditionally this would be left for the end. However there are good reasons for developing a good GUI early in this case as it can act as a catalyst for the more low-level work. See: /GUI, /SpeechMaker.

Speech recognition engines

Teams like Julius and Sphinx are working on open source solutions, but are largely held back by the lack of good free voice models, which in turn requires a large body of free, high quality voice data. The VoxForge project has been set up to provide this through community contributions, but the project needs a larger volunteer base and better end-user tools.

The front end should provide a simple way to record voice data and submit it to the VoxForge site directly. This will facilitate a distributed effort to improve recognition results. I should also be able to work with proprietary engines to be more immediately useful, speeding up general uptake of speech recognition on Linux.

Implementation

The Google Summer of Code project Gnome Voice Control created a UI for a simple command utility. It performs tasks like controlling windows, starting programs, moving through menus and simple text manipulation.

Test/Demo Plan

Comments

Moved to /Comments

SpeechRecognition (last edited 2011-03-19 15:16:56 by D9784B24)