SpeechRecognition/Comments

Comments

We need voice tools if we are going to get into mainstream corporate and government desktops. If we are going to be an alternative to MS Vista in these markets, speech-to-text is MORE IMPORTANT then 3d desktops and video codecs.

SusanCragin Simple speech recognition for text entry ONLY is now available by porting Dragon NaturallySpeaking to Ubuntu using WINE. I use DNS9.0 Preferred myself. Right now this is useful for text entry only into the supplied DragonPad or in the WINE version of Notepad. This is a major step forward. DNS is a good product. As I write, DNS 9.5 Standard and Preferred do NOT work. DNS 7 and work in Standard and Preferred.
EricSJohansson The ability to record voice data is only a tiny fraction of what's needed to turn current open-source speech recognition engines into something usable. In addition to the basic recognition engine, you also need to increase the vocabulary to 80,000 to 120,000 words, language modeling, adaption processes, correction processes, accessibility features for injection and correction of dictated text. In a nutshell, expect to spend between five and $10 million worth of effort in order to make a usable speech recognition environment with current open-source tools. While this is a laudable goal, disabled people can't afford to wait. today, we go with what we have which means a commercial product (i.e. NaturallySpeaking) on Windows and cobble together a variety of tools which enable us to interact with Linux applications. The end result is a somewhat usable environment that could use some significant improvements.
- Comment: I wonder if you have gotten to know to the current state of the mentioned open-source projects? Yes, vocabulary is lacking, but the other features (language modeling, adaptation/correction etc.) you have listed are not like they wouldn't be existing in current open-source solutions. I understand that you would like to use NaturallySpeaking as you've probably been using it for a long time, but your words sound a bit rash towards the projects like Sphinx-4 which is in many ways quite an advanced speech recognition project. I've tested it to some degree of success years ago. With VoxForge and of course the needed easy UI if such doesn't yet exist, the usable open source speech recognition might not be that far away. Of course, that doesn't mean good instructions for using Wine + NaturallySpeaking shouldn't be done, but it's not really related to this specification much. Anyone is free to start doing eg. scripts or something with which to enable NaturallySpeaking usage in some way, but it's always going to be an ugly hack. Anyway, NaturallySpeaking version 7 is apparently possible to get working under Wine, so that would be a starting point for anyone interested. It might still prove quite a task to have any reasonable interaction between the Wine application and the wanted target application.
  - EricSJohansson the last time I looked at the open source speech recognition projects (approximately a year ago) the various researchers/developers involved with Julius, Sphinx 3, and Sphinx 4 all stated that they were many years behind NaturallySpeaking in terms of usability, accuracy and speed. If memory serves, all of the current open source systems work well with a few thousand words and very simple grammars. When you increase vocabulary sizes, you decrease performance to the point where 20,000 words in sync for is something like 1 1/2 to 2 1/2 times real time (i.e. 10 second dictation would take 15 to 25 seconds to recognize). From what I've seen, most of the open-source recognition engines tackle the easy problem of IVR systems (i.e. constrained grammars and vocabularies). A free-form recognition engines such as NaturallySpeaking is much more complex. With 120,000 word vocabulary, it runs close enough to real time that I can live with it.
    As for NaturallySpeaking 7, http://appdb.winehq.org/appview.php?versionId=3227 I know some people in the local speech recognition group have tried but have not succeeded in getting it to run under wine. But the problem is not just speech recognition. You have audio issues. For example, when I use a microphone with a the wire, I'm always running with two sound cards, one of them USB. The last time I tried that configuration (nine months ago), it failed miserably. Right now, I'm using Bluetooth (VXI blue parrot) and I would love to see some evidence that Bluetooth headsets like these will work in Linux. The second area of concern is how one can do more than simple text injection and voice driven editing. In NaturallySpeaking, this is called Select-and-Say which allows you to verbally select a region and then perform operations on. The third most important region is navigation to fields or features by name. If I'm sounding harsh, it's because it is damn frustrating not having hands that work. I see those who do have hands going off in trying to solve a problem that's already been solved and leaving for later the currently unsolved and potentially harder problems of coupling speech recognition to Linux. My priorities (ordered most important to least) are local system dictation and correction (i.e. Select-and-Say), remote system dictation and correction, speaking menus, mouse control. the one functionality you should actually never give us is speaking a keyboard (Saying letters). While you sometimes need to say letters occasionally to spell out individual words, to do so on a regular basis is a cruel trick to play on a handicapped person. I've said this before and I'll keep repeating it until I'm blue in the face. For handicapped accessibility, functionality trumps politics. Open source is nice but not essential. Start with what works, build open source wrappers around that to build a better handicap friendly environment than Windows could ever have, then replace closed source components with equivalent functionality open source components so the user can choose. People have been waiting for real desktop speech recognition working with Linux for almost a decade now. Choosing an all open source route means more handicapped people will be choosing closed source because there is no other option.
    - Yeah, the point is that in that case people should just continue to use Windows, as sadly they don't have a choice. I understand to some degree the fact that it could be possibly somewhat easier task to wrap closed source windows program to work with linux applications in a way that is necessary, but it is still a hugely questionable if it could be done smoothly at all, which would mean a huge waste of resources for zero results. All in all, it's not in scope of this specification / project which is about finding an open source solution to the problem. The fact that it's not coming this year in its perfect form just means that world is not a perfect place, and never will be.
      The wrapper issue should be brought up to some people who are for example Wine / Windows experts. I don't see it as an easy job to find people / resources willing to do a huge wrapper software and massive (?) Wine improvements just to support some closed source company that's unfortunately not doing a Linux version of their program. Anyway, this specification is surely not a place to find the amount of people and money to do the wrapper job, but possibly some eg. handicapped people's association could try to hire some open source company like Codeweavers to actually try to do the Wine + integration work. Ubuntu is however about doing open source software, and it does not mean Ubuntu would not like to help eg. handicapped people (it's actually one of the most important goals for Ubuntu) - it just means that Ubuntu wants to resolve the problems in "the right way", to have a real solution that is not a hack. Long-term goal is to have open-source high quality solution that does not go away - any closed source wrapper development work can be swept off when eg. computer architectures change or anything else comes up, while the open source solution, when it appears, will help all the handicapped people of the world from that day forwards. In the shorter term, various people are forced to use closed source solutions all the time, and some of the handicapped people (for example those needing speech recognition, while others with other needs are already served) are forced to, too. Providing wrapper to NaturallySpeaking would not actually free those handicapped people to use their computers as they wish, since they would be bound to a closed source program anyway.
I believe that there are two short-term solutions. The obvious solution is NaturallySpeaking in wine. This can work, but it will probably be very limited as it won't be able to interact with Linux applications without significant amount of work. The second solution is a bridge between NaturallySpeaking on Windows and a remote Linux environment. the goal behind this model is to speed up the delivery of speech recognition driving the Linux environment. All the development effort could be focused on making Linux more accessible using speech recognition. while I wouldn't exactly say that handicap people don't care about ideology, the goal should be making handicapped people's lives easier and nothing else. if that means using a closed source product at the core in order to make people's lives better, then do so. Once handicapped people are working and playing with Linux, then start the effort to backfill closed source parts of the solution.
Warbo: I don't know much about the technical implementation of such a system, but if WINE would run such software well enough (I personally have an old version ViaVoice and a cut-down version of Dragon NaturallySpeaking lying around somewhere, but have never tested since moving to Linux) then surely a Windows running voice recognition > Linux applications tool would be pretty similar to a WINE running voice recognition > Linux applications tool, thus there would not be a need for Windows. I agree that the current approach to improving voice recognition just involves pouring masses and masses of research data at the problem, which would be incredibly costly financially or chronologically for a free program to do (I have had discussions with various professors working in and around this field), but just don't see why running the cost and resource (and freedom?) overhead of Windows is needed just to send data through some protocol that WINE could also use (ie. integrating a WINE solution into every application would be hard work, but so would integrating a remote/virtualised Windows system. Using a common accessibility protocol between Windows and a Linux interpreter would be easier, but why not use that with WINE instead of integrating it directly?).
EricSJohansson You will need a bridge between NaturallySpeaking and Linux whether it runs natively on Windows or in wine. Bridging between Windows and Linux has some advantages. It simplifies the development process by eliminating additional dependencies and potential problems with wine, it eliminates licensing problems and potential sabotage by nuance, enables use of different speech recognition packages and it provides support for users that must work both in Windows and Linux. The development of a bridge is going to be difficult enough without adding wine into the mix. Wine introduces instability and may leave the user or developer wondering what failed, the bridge or wine? wine will be good in the future but in the short term, it will only complicate things.
Nuance has taken an adversarial relationship to its customers by charging fees for bug reports, persistent problems with subsystems used to interface with many nonsupported applications, aggressive DRM, using update tool to advertise their own products, and licensing changes forbidding use of third-party macro packages with any version of NaturallySpeaking except NaturallySpeaking professional-based products, i.e. its most expensive products. It will be difficult to judge how Nuance will react to running NaturallySpeaking under wine. If they can find a way to generate significant profit, I can see them getting behind it and trying to take control. If they see it as something not worth bothering with (i.e. small profit) then there's a good chance that each revision will force wine to play catch-up.
A further downside of NaturallySpeaking on wine will extend an existing dependence on nuance whereas a system using both Windows and Linux will allow the user to choose between NaturallySpeaking and Microsoft speech recognition. while it not much of a choice, at least it changes the landscape from a monopoly to a duopoly. It's important to recognize that most people, myself included, can't totally walk away from Windows. I need to use Windows applications occasionally because the application isn't available on Linux or because the Windows version works better. An outgrowth of this ability to switch should be the ability to switch between multiple Linux instances, either on virtual machines or across the network. I've always thought the philosophy for speech recognition or indeed any handicap accessibility interface should be that you have your own box with all accessibility capabilities and that box can drive any other system thereby making it accessible to you without forcing the remote machine to have all of your accessibility aids as well.

* JanIschebeck Although it only supports basic commands, the project Speechlion is worth a look. It can be seen as some kind of successor for the xVoice project, which is suspended (as it depends on the viavoice library).

EricSJohansson this is a fantastic example of the inadequacy of handicap accessibility solutions. If you had a disability severe enough requiring the use of speech recognition, you would never use speechlion. You may not know this but these kind of systems failed in the marketplace in the early 1990s because they didn't solve any problems. It will be interesting if they start doing general dictation. There are two ways to get the experience of what it is like to use speech recognition. Actually use speech recognition on Windows. start by throwing away your keyboard and mouse. Try simple dictation tasks, correct and edit your documents. Dictate into supported and nonsupported applications. The other simulation would be to have someone type for you to your dictation. any commands must be followed blindly and cannot be undone. Every utterance must be typed (and always accurately). Extra points for dictating over cell phone connection. If you going to build support for disabled people, live their world. Make sure you know their pain from your own experience.

SpeechRecognition/Comments (last edited 2008-08-06 16:37:21 by localhost)

Ubuntu Wiki

Comments