SpeechRecognition
Please check the status of this specification in Launchpad before editing it. If it is Approved, contact the Assignee or another knowledgeable person before making changes.
Launchpad Entry: speech-recognition
Packages affected:
Also see: /GUI, /SpeechMaker
Summary
A roadmap for providing speech recognition on Ubuntu (an informational spec).
Release Note
[speech recognition will be a long project; this release notes blurb describes the first component likely to land]. A speech recognition utility lets you control your computer with simple commands like 'Open Firefox'. A new user interface utilises existing voice recognition engines like Sphinx.
Rationale
Robust speech recognition will be useful for many groups for both dictation and navigation. There are currently no workable solutions available on Linux.
Use Cases
- Professionals who perform dictation
- Non-lating language input
- Mobility impaired
- Sufferers of RSI
- Domotic and remote control
- User Authentication?
Assumptions
Making a start on certain key bits of the stack will act as a catalyst to move development forward in the more technical aspecs.
Design
Front end
Technically, the front end is the easiest part of the puzzle, and traditionally this would be left for the end. However there are good reasons for developing a good GUI early in this case as it can act as a catalyst for the more low-level work. See: /GUI, /SpeechMaker.
Speech recognition engines
Teams like Julius and Sphinx are working on open source solutions, but are largely held back by the lack of good free voice models, which in turn requires a large body of free, high quality voice data. The VoxForge project has been set up to provide this through community contributions, but the project needs a larger volunteer base and better end-user tools.
The front end should provide a simple way to record voice data and submit it to the VoxForge site directly. This will facilitate a distributed effort to improve recognition results. I should also be able to work with proprietary engines to be more immediately useful, speeding up general uptake of speech recognition on Linux.
Implementation
The Google Summer of Code project Gnome Voice Control created a UI for a simple command utility. It performs tasks like controlling windows, starting programs, moving through menus and simple text manipulation.
Test/Demo Plan
Comments
Moved to /Comments
SpeechRecognition (last edited 2011-03-19 15:16:56 by D9784B24)