##(see the SpecSpec for an explanation) ''Please check the status of this specification in Launchpad before editing it. If it is Approved, contact the Assignee or another knowledgeable person before making changes.'' * '''Launchpad Entry''': UbuntuSpec:speech-recognition * '''Packages affected''': * '''Also see''': [[/GUI]], [[/SpeechMaker]] == Summary == A roadmap for providing speech recognition on Ubuntu (an informational spec). == Release Note == ''[speech recognition will be a long project; this release notes blurb describes the first component likely to land]''. A speech recognition utility lets you control your computer with simple commands like 'Open Firefox'. A new user interface utilises existing voice recognition engines like Sphinx. == Rationale == Robust speech recognition will be useful for many groups for both dictation and navigation. There are currently no workable solutions available on Linux. == Use Cases == * Professionals who perform dictation * Non-lating language input * Mobility impaired * Sufferers of RSI * Domotic and remote control * User Authentication? == Assumptions == Making a start on certain key bits of the stack will act as a catalyst to move development forward in the more technical aspecs. == Design == === Front end === Technically, the front end is the easiest part of the puzzle, and traditionally this would be left for the end. However there are good reasons for developing a good GUI early in this case as it can act as a catalyst for the more low-level work. See: [[/GUI]], [[/SpeechMaker]]. === Speech recognition engines === Teams like Julius and Sphinx are working on open source solutions, but are largely held back by the lack of good free voice models, which in turn requires a large body of free, high quality voice data. The Vox``Forge project has been set up to provide this through community contributions, but the project needs a larger volunteer base and better end-user tools. The front end should provide a simple way to record voice data and submit it to the Vox``Forge site directly. This will facilitate a distributed effort to improve recognition results. I should also be able to work with proprietary engines to be more immediately useful, speeding up general uptake of speech recognition on Linux. == Implementation == The Google Summer of Code project [[http://live.gnome.org/GnomeVoiceControl|Gnome Voice Control]] created a UI for a simple command utility. It performs tasks like controlling windows, starting programs, moving through menus and simple text manipulation. == Test/Demo Plan == == Comments == Moved to [[/Comments]]