Summary

A front-end GUI for speech recognition suitable for any engine, including those running remotely on a separate computer.

Rationale

Speech recognition is a large and complex project, but the front end is a relatively simple component that could be written now, and may help provide a user base and momentum for the recognition engine work.

Use Cases

Scope

Design

The GUI

Windows client

A simple text-input window that can collect text from the speech engine and send it to the Ubuntu application, converting commands and macros as needed.

Text transfer protocol

A simple XML protocol to transmit the text feed plus some embedded command statements.

NaturallySpeaking and ViaVoice both support user-defined macros. The user records voice sequence such as 'Computer: delete that sentence' or 'insert my address block' and can also define the corresponding action or text block. We can use this to create a rich set of editing commands on the Ubuntu end as well. The phrase 'Computer: delete that sentence' could be linked with the macro text <command>delete-sentence</command>. This would be transferred from the Windows client as any other text, but would be given special meaning at the receiving application and would invoke the appropriate action in the host editor.

This scheme does require some configuration, such as creating the macros in the commercial system, but yields a highly configurable solution.

Implementation

Outstanding Issues

BoF agenda and discussion


CategorySpec

SpeechRecognition/GUI (last edited 2008-08-06 16:18:30 by localhost)