DesktopTeam/Specs/Karmic/GnomeSpeechReplacement

GnomeSpeechReplacement

Launchpad Entry: desktop-karmic-gnome-speech-replacement
Created: 2009-01-02
Contributors: Luke Yelavich
Packages affected: gnome-speech, speech-dispatcher, gok, dasher, gnome-orca

Summary

Replace the gnome specific speech API gnome-speech with a desktop agnostic speech API, speech-dispatcher, to resolve speech service requirements for GNOME 3.0.

Release Note

GNOME accessibility technologies now use speech-dispatcher for delivering text to speech to users. Speech-dispatcher provides a flexible architecture for converting text into synthesized speech, with support for several speech synthesizers out of the box, such as espeak and festival. Several languages are also supported.

Rationale

There are plans to deprecate bonobo for GNOME 3.0. The GNOME accessibility technologies currently use bonobo rather heavily to provide access to the GNOME desktop. The other areas that need addressing are discussed at http://live.gnome.org/Accessibility/BonoboDeprecation.

Speech-dispatcher is currently the best alternative, as it already works with Orca, and can be used with other accessibility programs in Linux, BrlTTY, speakup, and yasr. However in order to be a viable speech backend for GNOME accessibility technologies, and indeed Linux, some important changes need to be made to speech-dispatcher itself. This specification details the needed changes for speech-dispatcher to replace gnome-speech for GNOME 3.0 and beyond.

The work to transition GNOME to speech-dispatcher is expected to take from now till GNOME 3.0 is released. For Karmic/the 2.27 GNOME cycle, the focus will be on what is discussed in this specification, to get speech-dispatcher ready for use with gnome-orca, including enabling speech-dispatcher in Ubuntu. The next 6 months for GNOME 2.28/3.0 will be spent bug fixing, and creating additional GUI configuration tools to make configuration of speech-dispatcher easier. This work will be discussed in a future specification.

Implementation

Speech-dispatcher already works quite well with orca, however there is some extra functionality that needs to be added before speech-dispatcher can be considered a usable and default speech backend for orca and other GNOME accessibility technologies. A full specification regarding the work that needs to be done for upstream adoption is documented here: http://www.themuso.id.au/speech/speech-dispatcher-orca-integration.txt. Changes that are relevant to Ubuntu, but also apply to upstream, are documented in the following sections.

UI Changes

Additional command-line arguments need to be added to the speech-dispatcher daemon to allow for the enabling or disabling of specific functionality, such as:
- logging - If logging is completely disabled, one should be able to re-enable it with a command-line flag for debugging purposes.
- server/client communication - Command-line flags need to be added to force the daemon to communicate via a different method/allowed IP and port to what is configured in the configuration file.

Code Changes

Speech-dispatcher's logging architecture needs to be refactored to allow greater differences of logfile output, depending on the level requested. Currently with speech-dispatcher's log level set to 0, logs are still produced, and can take up a good 15 to 20 megabytes of disk space after a complete day's work. Log level 0 should really produce no logging whatsoever. In addition, the driver modules should check the log level setting for themselves, and only output log data when the log level is set to a number greater than 0. This could also increase the performance of synthesizer drivers, although this would likely be neglegable.
The TCP port used for server-client communication in a user session needs to be unique, something like a base port number eg port 6560 + UID. The C/Python bindings need to be extended to use this port.
More flexible audio output selection. Speech-dispatcher currently allows one audio output option to be configured for the daemon and its speech driver instances. If the audio output choice fails or cannot be accessed, speech-dispatcher should fall back to either a preset fall back audio output, or if none is given, fall back to a default audio output, continuing to fall back to the next preferred choice if need be. In addition, the user shouldn't know about the change, except where they have configured speech-dispatcher to let them know about the fallback.
The speech-dispatcher client API needs to be extended to allow a client to request a particular punctuation setting, and for speech-dispatcher to return a flag to tell the client whether it is in control of managing punctuation, depending on the user settings for the speech synthesizer in question.
The above API changes for punctuation also need to be made for handling capital letters, word pronounciations, and audio cues.
speech-dispatcher should be extended to index by word, as well as by sentence/utterance, with a possible need to make this configurable per client.
Each synthesizer driver needs to have a lookup table to allow different punctuation settings per supported synthesizer language. This table should be hand editable. Such tables may possibly be required at the system and user level.
Dasher and gok will need to be ported to work with speech-dispatcher's client library.

Migration

There is no configuration migration required, except in the case where the user is already using speech-dispatcher for daily use. The only configuration option that may be affected is the choice of audio backend, since the user's configuration will not take gracefull audio backend fallback into consideration. The user needs to be warned about this, probably with an initial audio warning message on speech-dispatcher load.
Speech-dispatcher needs to be promoted to main, and gnome-speech needs to be demoted to universe. This will allow for speech-dispatcher to be the new default speech backend for gnome-orca, but still allow users to use gnome-speech if they wish. Having gnome-speech around will be useful in transitioning to speech-dispatcher, so that users can file bugs for any issues they have with speech-dispatcher, as opposed to how gnome-speech behaves.

Test/Demo Plan

To test speech-dispatcher cunctionality, one can follow these steps to test various parts of this implementation.

To test the automatic loading of the speech-dispatcher server when a user wishes to have a text string spoken, one only needs to run the spd-say command with the text they wish to have spoken. For example "spd-say Hello World." should cause speech-dispatcher to speak hello world. In doing this, the speech-dispatcher server should automatically load and speak the text.

To test speech-dispatcher with gnome-orca, one should do the following:

Enable accessibility support, for GNOME, by going to the System menu, choosing preferences, and selecting assistive technologies.
Check the "Enable assistive technologies" check box, and close the window.
Log out, and back into the GNOME session.
Run orca from teh run dialog box, with the command "orca -s". This shoudl bring up a GUI preferences window, and Orca should speak.

NOTE: For both above cases, speech-dispatcher's audio output may break up occasionally. This is due to speech-dispatcher's pulseaudio output code, which is not currently performing as well as is required. A short term work-around should be implemented by the final karmic release. Long term, this code needs to be optimized, along with other parts of speech-dispatcher.

Unresolved issues

A modular configuration program needs to be written, to offer an interface in console, ncurses, GTK, and when full accessibility support is available, QT. This program would allow users to configure speech-dispatcher's various settings, as well as settings for individual speech synthesizers, without the need for hand-editing configuration files. This program should also allow speech-dispatcher clients to request individual speech synthesizer configuration windows to be shown, so that the client can present more speech configuration to the user seemlessly, without them having to worry about what is producing the speech behind the scenes.
Since this specification also discusses work that needs to go upstream, an effort will have to be made by the specification implementer to ensure the code written for this specification works on all GNOME supported platforms, including Solaris, and functioning on a Sun Ray like client/server model configuration.
Currently, the only well known and widely used proprietary speech synthesizer driver available for speech-dispatcher is IBMTTS/TTsynth/Voxin. Other speech synthesizers like Dectalk and Cepstral are supported only by using the sd_generic driver to wrap around the synthesizers' various command-line speech utilities, like DECtalk's say. Drivers will need to be written for these speech synthesizers.
Speech-dispatcher needs to be extended to allow for the building of speech drivers outside the speech-dispatcher source tree, to facilitate easier setup of proprietary speech synthesizers to be used with speech-dispatcher.
An install/setup standard needs to be defined for easier proprietary speech synthesizer setup, including the compilation of the speech-dispatcher driver. The author of this speech-dispatcher/GNOME accessibility technologies specification has some ideas, but is yet to sketch out an initial mockup and specification. THis idea is based on what dkms does for kernel modules.
Speech-dispatcher must be able to handle clients running at the system level, such as speakup and BrlTTY.

CategorySpec

References

Visible links
http://testcases.qa.ubuntu.com/Coverage/NewFeatures