Motivation

There is an increasing number of projects which aim to increase the accessibility of Unix-like systems for blind people. These include Emacspeak, YASR, XVI for Linux, and Access-Mozilla. Each of these projects must currently implement its own interface to at least one speech synthesizer; and the best ones support several synthesizers. Each synthesizer has its own commands for stopping the speech and setting speech parameters such as speaking rate and punctuation level. Some of these access tools also filter the text themselves to control pronunciation and the level of punctuation that is spoken. Finally, if more than one of these applications tries to use the speech synthesizer simultaneously, conflicts may arise. In general, the complexity of interfacing to speech synthesizers increases the difficulty of writing applications that talk.

Some attempts have been made to separate the speech synthesizer interface from the rest of an application. Probably the first such attempt was the "speech server" in Emacspeak. A speech server is a separate program which takes commands from Emacspeak in the form of Tcl procedure calls and performs the appropriate action for a particular speech synthesizer. However, the design of the speech server interface is centered around the DECtalk synthesizers, since the author of Emacspeak uses a DECtalk. Another problem with this approach is that the communication between an application and a speech server is done through a traditional pipe, so each application that wants to use a speech server must run its own instance, and multiple instances may not do a good job of sharing the same synthesizer.

Another effort to separate the speech synthesizer interface from the rest of the application is the speech/Braille server in the XVI package mentioned earlier. The concept of this approach is good; applications connect to a single server and send text and commands to it. However, the current implementation is proprietary, and the interface doesn't give the user much flexibility in setting speech parameters.

Another attempt at a speech synthesizer interface is the original speechd package. The attractive feature of this interface is its simplicity; to speak, an application simply sends text to /dev/speech (which is technically a FIFO special file). However, this alone is too simple. There is no uniform method of stopping the speech or setting speech parameters. Also, speechd is a monolithic program whose main loop is re-implemented for each type of synthesizer. Thus, some code is duplicated, and there is no mechanism for adding "plug-in" driver modules. Finally, and perhaps most important, the project seems to be dead, and its author doesn't seem to be responsive to suggestions.

Therefore, I am creating my own interface. It is compatible with applications that use /dev/speech, but I hope that it will provide much more than the original speechd. I have chosen Python as my programming language because its syntax is easy to learn, it's object-oriented, and it enables easy loading of arbitrary modules at run-time. I hope that OISS will become the standard interface for Unix applications that require speech synthesis.

Return to the OISS home page.

If you would like to contact me, email me at mattcampbell@pobox.com or contact me on ICQ.