Contents
|
Previous
|
Next
|
Chapter 4
Speech Engines: javax.speech
This chapter introduces the
javax.speechpackage.This package defines the behavior of all speech engines (speech recognizers and synthesizers). The topics covered include:
4.1 What is a Speech Engine?
The
javax.speechpackage of the Java Speech API defines an abstract software representation of a speech engine. "Speech engine" is the generic term for a system designed to deal with either speech input or speech output. Speech synthesizers and speech recognizers are both speech engine instances. Speaker verification systems and speaker identification systems are also speech engines but are not currently supported through the Java Speech API.The
javax.speechpackage defines classes and interfaces that define the basic functionality of an engine. Thejavax.speech.synthesispackage andjavax.speech.recognitionpackage extend and augment the basic functionality to define the specific capabilities of speech synthesizers and speech recognizers.The Java Speech API makes only one assumption about the implementation of a JSAPI engine: that it provides a true implementation of the Java classes and interfaces defined by the API. In supporting those classes and interfaces, an engine may completely software-based or may be a combination of software and hardware. The engine may be local to the client computer or remotely operating on a server. The engine may be written entirely as Java software or may be a combination of Java software and native code.
The basic processes for using a speech engine in an application are as follows.
- Identify the application's functional requirements for an engine (e.g, language or dictation capability).
- Locate and create an engine that meets those functional requirements.
- Allocate the resources for the engine.
- Set up the engine.
- Begin operation of the engine - technically, resume it.
- Use the engine
- Deallocate the resources of the engine.
Steps 4 and 6 in this process operate differently for the two types of speech engine - recognizer or synthesizer. The other steps apply to all speech engines and are described in the remainder of this chapter.
The "Hello World!" code example for speech synthesis and the "Hello World!" code example for speech recognition both illustrate the 7 steps described above. They also show that simple speech applications are simple to write with the Java Speech API - writing your first speech application should not be too hard.
4.2 Properties of a Speech Engine
Applications are responsible for determining their functional requirements for a speech synthesizer and/or speech recognizer. For example, an application might determine that it needs a dictation recognizer for the local language or a speech synthesizer for Korean with a female voice. Applications are also responsible for determining behavior when there is no speech engine available with the required features. Based on specific functional requirements, a speech engine can be selected, created, and started. This section explains how the features of a speech engine are used in engine selection, and how those features are handled in Java software.
Functional requirements are handled in applications as engine selection properties. Each installed speech synthesizer and speech recognizer is defined by a set of properties. An installed engine may have one or many modes of operation, each defined by a unique set of properties, and encapsulated in a mode descriptor object.
The basic engine properties are defined in the
EngineModeDescclass. Additional specific properties for speech recognizers and synthesizers are defined by theRecognizerModeDescandSynthesizerModeDescclasses that are contained in thejavax.speech.recognitionandjavax.speech.synthesispackages respectively.In addition to mode descriptor objects provided by speech engines to describe their capabilities, an application can create its own mode descriptor objects to indicate its functional requirements. The same Java classes are used for both purposes. An engine-provided mode descriptor describes an actual mode of operation whereas an application-defined mode descriptor defines a preferred or desired mode of operation. (Locating, Selecting and Creating Engines describes the use of a mode descriptor.)
The basic properties defined for all speech engines are listed in Table 4-1
.
The one additional property defined by the
SynthesizerModeDescclass for speech synthesizers is shown in Table 4-2
.
The two additional properties defined by the
RecognizerModeDescclass for speech recognizers are shown in Table 4-3
.
All three mode descriptor classes,
EngineModeDesc,SynthesizerModeDescandRecognizerModeDescuse the get and set property patterns for JavaBeansTM. For example, theLocaleproperty has get and set methods of the form:
Locale getLocale();
void setLocale(Locale l);Furthermore, all the properties are defined by class objects, never by primitives (primitives in the Java programming language include
boolean,intetc.). With this design, anullvalue always represents "don't care" and is used by applications to indicate that a particular property is unimportant to its functionality. For instance, anullvalue for the "dictation supported" property indicates that dictation is not relevant to engine selection. Since that property is represented by theBooleanclass, a value ofTRUEindicates that dictation is required andFALSEindicates explicitly that dictation should not be provided.
4.3 Locating, Selecting and Creating Engines
4.3.1 Default Engine Creation
The simplest way to create a speech engine is to request a default engine. This is appropriate when an application wants an engine for the default locale (specifically for the local language) and does not have any special functional requirements for the engine. The
Centralclass in thejavax.speechpackage is used for locating and creating engines. Default engine creation uses two static methods of theCentralclass.
Synthesizer Central.createSynthesizer(EngineModeDesc mode);
Recognizer Central.createRecognizer(EngineModeDesc mode);The following code creates a default
RecognizerandSynthesizer.
import javax.speech.*; import javax.speech.synthesis.*; import javax.speech.recognition.*; { // Get a synthesizer for the default locale Synthesizer synth = Central.createSynthesizer(null); // Get a recognizer for the default locale Recognizer rec = Central.createRecognizer(null); }For both the
createSynthesizerandcreateRecognizerthenullparameters indicate that the application doesn't care about the properties of the synthesizer or recognizer. However, both creation methods have an implicit selection policy. Since the application did not specify the language of the engine, the language from the system's default locale returned byjava.util.Locale.getDefault()is used. In all cases of creating a speech engine, the Java Speech API forces language to be considered since it is fundamental to correct engine operation.If more than one engine supports the default language, the
Centralthen gives preference to an engine that is running (running property is true), and then to an engine that supports the country defined in the default locale.If the example above is performed in the US locale, a recognizer and synthesizer for the English language will be returned if one is available. Furthermore, if engines are installed for both British and US English, the US English engine would be created.
4.3.2 Simple Engine Creation
The next easiest way to create an engine is to create a mode descriptor, define desired engine properties and pass the descriptor to the appropriate engine creation method of the
Centralclass. When the mode descriptor passed to thecreateSynthesizerorcreateRecognizermethods is non-null, an engine is created which matches all of the properties defined in the descriptor. If no suitable engine is available, the methods returnnull.The list of properties is described in the Properties of a Speech Engine. All the properties in
EngineModeDescand its sub-classesRecognizerModeDescandSynthesizerModeDescdefault tonullto indicate "don't care".The following code sample shows a method that creates a dictation-capable recognizer for the default locale. It returns
nullif no suitable engine is available.
/** Get a dictation recognizer for the default locale */ Recognizer createDictationRecognizer() { // Create a mode descriptor with all required features RecognizerModeDesc required = new RecognizerModeDesc(); required.setDictationGrammarSupported(Boolean.TRUE); return Central.createRecognizer(required); }Since the
requiredobject provided to thecreateRecognizermethod does not have a specified locale (it is not set, so it isnull) theCentralclass again enforces a policy of selecting an engine for the language specified in the system's default locale. TheCentralclass will also give preference to running engines and then to engines that support the country defined in the default locale.In the next example we create a
Synthesizerfor Spanish with a male voice.
/** * Return a speech synthesizer for Spanish. * Return null if no such engine is available. */ Synthesizer createSpanishSynthesizer() { // Create a mode descriptor with all required features // "es" is the ISO 639 language code for "Spanish" SynthesizerModeDesc required = new SynthesizerModeDesc(); required.setLocale(new Locale("es", null)); required.addVoice(new Voice( null, GENDER_MALE, AGE_DONT_CARE, null)); return Central.createSynthesizer(required); }Again, the method returns null if no matching synthesizer is found and the application is responsible for determining how to handle the situation.
4.3.3 Advanced Engine Selection
This section explains more advanced mechanisms for locating and creating speech engines. Most applications do not need to use these mechanisms. Readers may choose to skip this section.
In addition to performing engine creation, the Central class can provide lists of available recognizers and synthesizers from two static methods.
EngineList availableSynthesizers(EngineModeDesc mode);
EngineList availableRecognizers(EngineModeDesc mode);If the mode passed to either method is
null, then all known speech recognizers or synthesizers are returned. Unlike thecreateRecognizerandcreateSynthesizermethods, there is no policy that restricts the list to the default locale or to running engines - in advanced selection such decisions are the responsibility of the application.Both
availableSynthesizersandavailableRecognizersreturn anEngineListobject, a sub-class ofVector. If there are no available engines, or no engines that match the properties defined in the mode descriptor, the list is zero length (notnull) and itsisEmptymethod returnstrue. Otherwise the list contains a set ofSynthesizerModeDescorRecognizerModeDescobjects each defining a mode of operation of an engine. These mode descriptors are engine-defined so all their features are defined (non-null) and applications can test these features to refine the engine selection.Because
EngineListis a sub-class ofVector, each element it contains is a JavaObject. Thus, when accessing the elements applications need to cast the objects toEngineModeDesc,SynthesizerModeDescorRecognizerModeDesc.The following code shows how an application can obtain a list of speech synthesizers with a female voice for German. All other parameters of the mode descriptor remain
nullfor "don't care" (engine name, mode name etc.).
import javax.speech.*; import javax.speech.synthesis.*; // Define the set of required properties in a mode descriptor SynthesizerModeDesc required = new SynthesizerModeDesc(); required.setLocale(new Locale("de", "")); required.addVoice(new Voice( null, GENDER_FEMALE, AGE_DONT_CARE, null)); // Get the list of matching engine modes EngineList list = Central.availableSynthesizers(required); // Test whether the list is empty - any suitable synthesizers? if (list.isEmpty()) ...If the application specifically wanted Swiss German and a running engine it would add the following before calling
availableSynthesizers:
required.setLocale(new Locale("de", "CH"));
required.setRunning(Boolean.TRUE);To create a speech engine from a mode descriptor obtained through the
availableSynthesizersandavailableRecognizersmethods, an application simply calls thecreateSynthesizerorcreateRecognizermethod. Because the engine created the mode descriptor and because it provided values for all the properties, it has sufficient information to create the engine directly. An example later in this section illustrates the creation of aRecognizerfrom an engine- provided mode descriptor.Although applications do not normally care, engine-provided mode descriptors are special in two other ways. First, all engine-provided mode descriptors are required to implement the
EngineCreateinterface which includes a singlecreateEnginemethod. TheCentralclass uses this interface to perform the creation. Second, engine-provided mode descriptors may extend theSynthesizerModeDescandRecognizerModeDescclasses to encapsulate additional features and information. Applications should not access that information if they want to be portable, but engines will use that information when creating a runningSynthesizerorRecognizer.4.3.3.1 Refining an Engine List
If more than one engine matches the required properties provided to
availableSynthesizersoravailableRecognizersthen the list will have more than one entry and the application must choose from amongst them.In the simplest case, applications simply select the first in the list which is obtained using the
EngineList.firstmethod. For example:
EngineModeDesc required; ... EngineList list = Central.availableRecognizers(required); if (!list.isEmpty()) { EngineModeDesc desc = (EngineModeDesc)(list.first()); Recognizer rec = Central.createRecognizer(desc); }More sophisticated selection algorithms may test additional properties of the available engine. For example, an application may give precedence to a synthesizer mode that has a voice called "Victoria".
The list manipulation methods of the
EngineListclass are convenience methods for advanced engine selection.
anyMatch(EngineModeDesc)returns true if at least one mode descriptor in the list has the required properties.
requireMatch(EngineModeDesc)removes elements from the list that do not match the required properties.The following code shows how to use these methods to obtain a Spanish dictation recognizer with preference given to a recognizer that has been trained for a specified speaker passed as an input parameter.
import javax.speech.*; import javax.speech.recognition.*; import java.util.Locale; Recognizer getSpanishDictation(String name) { RecognizerModeDesc required = new RecognizerModeDesc(); required.setLocale(new Locale("es", "")); required.setDictationGrammarSupported(Boolean.TRUE); // Get a list of Spanish dictation recognizers EngineList list = Central.availableRecognizers(required); if (list.isEmpty()) return null; // nothing available // Create a description for an engine trained for the speaker SpeakerProfile profile = new SpeakerProfile(null, name, null); RecognizerModeDesc requireSpeaker = new RecognizerModeDesc(); requireSpeaker.addSpeakerProfile(profile); // Prune list if any recognizers have been trained for speaker if (list.anyMatch(requireSpeaker)) list.requireMatch(requireSpeaker); // Now try to create the recognizer RecognizerModeDesc first = (RecognizerModeDesc)(list.firstElement()); try { return Central.createRecognizer(first); } catch (SpeechException e) { return null; } }
4.4 Engine States
4.4.1 State systems
The
Engineinterface includes a set of methods that define a generalized state system manager. Here we consider the operation of those methods. In the following sections we consider the two core state systems implemented by all speech engines: the allocation state system and the pause-resume state system. In Chapter 5, the state system for synthesizer queue management is described. In Chapter 6, the state systems for recognizer focus and for recognition activity are described.A state defines a particular mode of operation of a speech engine. For example, the output queue moves between the
QUEUE_EMPTYandQUEUE_NOT_EMPTYstates. The following are the basics of state management.The
getEngineStatemethod of theEngineinterface returns the current engine state. The engine state is represented by alongvalue (64-bit value). Specified bits of the state represent the engine being in specific states. This bit- wise representation is used because an engine can be in more than one state at a time, and usually is during normal operation.Every speech engine must be in one and only one of the four allocation states (described in detail in Section 4.4.2). These states are
DEALLOCATED,ALLOCATED,ALLOCATING_RESOURCESandDEALLOCATING_RESOURCES. TheALLOCATEDstate has multiple sub-states. AnyALLOCATEDengine must be in either thePAUSEDor theRESUMEDstate (described in detail in Section 4.4.4).Synthesizers have a separate sub-state system for queue status. Like the paused/resumed state system, the
QUEUE_EMPTYandQUEUE_NOT_EMPTYstates are both sub-states of theALLOCATEDstate. Furthermore, the queue status and the paused/resumed status are independent.Recognizers have three independent sub-state systems to the
ALLOCATEDstate (thePAUSED/RESUMEDsystem plus two others). TheLISTENING,PROCESSINGandSUSPENDEDstates indicate the current activity of the recognition process. TheFOCUS_ONandFOCUS_OFFstates indicate whether the recognizer currently has speech focus. For a recognizer, all three sub-state systems of theALLOCATEDstate operate independently (with some exceptions that are discussed in the recognition chapter).Each of these state names is represented by a static long in which a single unique bit is set. The & and | operators of the Java programming language are used to manipulate these state bits. For example, the state of an allocated, resumed synthesizer with an empty speech output queue is defined by:
(Engine.ALLOCATED | Engine.RESUMED | Synthesizer.QUEUE_EMPTY)To test whether an engine is resumed, we use the test:
if ((engine.getEngineState() & Engine.RESUMED) != 0) ...For convenience, the
Engineinterface defines two additional methods for handling engine states. ThetestEngineStatemethod is passed a state value and returnstrueif all the state bits in that value are currently set for the engine. Again, to test whether an engine is resumed, we use the test:
if (engine.testEngineState(Engine.RESUMED)) ...Technically, the
testEngineState(state)method is equivalent to:
if ((engine.getEngineState() & state) == state)...The final state method is
waitEngineState. This method blocks the calling thread until the engine reaches the defined state. For example, to wait until a synthesizer stops speaking because its queue is empty we use:
engine.waitEngineState(Synthesizer.QUEUE_EMPTY);In addition to method calls, applications can monitor state through the event system. Every state transition is marked by an
EngineEventbeing issued to eachEngineListenerattached to theEngine. TheEngineEventclass is extended by theSynthesizerEventandRecognizerEventclasses for state transitions that are specific to those engines. For example, theRECOGNIZER_PROCESSINGRecognizerEventindicates a transition from theLISTENINGstate to thePROCESSING(which indicates that the recognizer has detected speech and is producing a result).4.4.2 Allocation State System
Engine allocation is the process in which the resources required by a speech recognizer or synthesizer are obtained. Engines are not automatically allocated when created because speech engines can require substantial resources (CPU, memory and disk space) and because they may need exclusive access to an audio resource (e.g. microphone input or speaker output). Furthermore, allocation can be a slow procedure for some engines (perhaps a few seconds or over a minute).
The
allocatemethod of theEngineinterface requests the engine to perform allocation and is usually one of the first calls made to a created speech engine. A newly created engine is always in theDEALLOCATEDstate. A call to theallocatemethod is, technically speaking, a request to the engine to transition to theALLOCATEDstate. During the transition, the engine is in a temporaryALLOCATING_RESOURCESstate.The
deallocatemethod of theEngineinterface requests the engine to perform deallocation of its resources. All well-behaved applications calldeallocateonce they have finished using an engine so that its resources are freed up for other applications. Thedeallocatemethod returns the engine to theDEALLOCATEDstate. During the transition, the engine is in a temporaryDEALLOCATING_RESOURCESstate.Figure 4-1 shows the state diagram for the allocation state system.
![]()
Each block represents a state of the engine. An engine must always be in one of the four specified states. As the engine transitions between states, the event labelled on the transition arc is issued to the
EngineListenersattached to the engine.The normal operational state of an engine is
ALLOCATED. The paused-resumed state of an engine is described in the next section. The sub-state systems ofALLOCATEDsynthesizers and recognizers are described in Chapter 5 and Chapter 6 respectively.4.4.3 Allocated States and Call Blocking
For advanced applications, it is often desirable to start up the allocation of a speech engine in a background thread while other parts of the application are being initialized. This can be achieved by calling the
allocatemethod in a separate thread. The following code shows an example of this using an inner class implementation of theRunnableinterface. To determine when the allocation method is complete, we check later in the code for the engine being in theALLOCATEDstate.
Engine engine; { engine = Central.createRecognizer(); new Thread(new Runnable() { public void run() { try { engine.allocate(); } catch (Exception e) { e.printStackTrace(); } } }).start(); // Do other stuff while allocation takes place ... // Now wait until allocation is complete engine.waitEngineState(Engine.ALLOCATED); } }A full implementation of an application that uses this approach to engine allocation needs to consider the possibility that the allocation fails. In that case, the allocate method throws an
EngineExceptionand the engine returns to the DEALLOCATED state.Another issue advanced applications need to consider is class blocking. Most methods of the
Engine,RecognizerandSynthesizerare defined for normal operation in the ALLOCATED state. What if they are called for an engine in another allocation state? For most methods, the operation is defined as follows:
- ALLOCATED state: for nearly all methods normal behavior is defined for this state. (An exception is the
allocatemethod).
- ALLOCATING_RESOURCES state: most methods block in this state. The calling thread waits until the engine reaches the ALLOCATED state. Once that state is reached, the method behaves as normally defined.
- DEALLOCATED state: most methods are not defined for this state, so an
EngineStateErroris thrown. (Exceptions include theallocatemethod and certain methods listed below.)
- DEALLOCATING_RESOURCES state: most methods are not defined for this state, so an
EngineStateErroris thrown.A small subset of engine methods will operate correctly in all engine states. The
getEnginePropertiesalways allows runtime engine properties to be set and tested (although properties only take effect in the ALLOCATED state). ThegetEngineModeDescmethod can always return the mode descriptor for the engine. Finally, the three engine state methods -getEngineState,testEngineStateandwaitEngineState- always operated as defined.4.4.4 Pause - Resume State System
All
ALLOCATEDspeech engines havePAUSEDandRESUMEDstates. Once an engine reaches theALLOCATEDstate, it enters either thePAUSEDor theRESUMEDstate. The factors that affect the initialPAUSED/RESUMEDstate are described below.The
PAUSED/RESUMEDstate indicates whether the audio input or output of the engine is on or off. A resumed recognizer is receiving audio input. A paused recognizer is ignoring audio input. A resumed synthesizer produces audio output as it speaks. A paused synthesizer is not producing audio output.As part of the engine state system, the Engine interface provides several methods to test
PAUSED/RESUMEDstate. The general state system is described previously in Section 4.4.An application controls an engine's
PAUSED/RESUMEDstate with thepauseandresumemethods. An application may pause or resume an engine indefinitely. Each time thePAUSED/RESUMEDstate changes anENGINE_PAUSEDorENGINE_RESUMEDtype ofEngineEventis issued eachEngineListenerattached to theEngine.Figure 4-2 shows the basic pause and resume diagram for a speech engine. As a sub-state system of the
ALLOCATEDstate, the pause and resume states represented within theALLOCATEDstate as shown in Figure 4-1.![]()
As with Figure 4-1, Figure 4-2 represents states as labelled blocks, and the engine events as labelled arcs between those blocks. In this diagram the large block is the
ALLOCATEDstate which contains both thePAUSEDandRESUMEDstates.4.4.5 State Sharing
The
PAUSED/RESUMEDstate of a speech engine may, in many situations, be shared by multiple applications. Here we must make a distinction between the Java object that represents aRecognizerorSynthesizerand the underlying engine that may have multiple Java and non-Java applications connected to it. For example, in personal computing systems (e.g., desktops and laptops), there is typically a single engine running and connected to microphone input or speaker/ headphone output and all application share that resource.When a
RecognizerorSynthesizer(the Java software object) is paused and resumed the shared underlying engine is paused and resumed and all applications connected to that engine are affected.There are three key implications from this architecture:
- An application should pause and resume an engine only in response to a user request (e.g., because a microphone button is pressed for a recognizer). For example, it should not pause an engine before deallocating it.
- A
RecognizerorSynthesizermay be paused and resumed because of a request by another application. The application will receive anENGINE_PAUSEDorENGINE_RESUMEDevent and the engine state value is updated to reflect the current engine state.
- Because an engine could be resumed without explicitly requesting a resume it should always be prepared for that resume. For example, it should not place text on the synthesizer's output queue unless it would expect it to be spoken upon a resume. Similarly, the set of enabled grammars of a recognizer should always be appropriate to the application context, and the application should be prepared to accept input results from the recognizer if an enabled grammar is unexpectedly resumed.
4.4.6 Synthesizer Pause
For a speech synthesizer - a speech output device - pause immediately stops the audio output of synthesized speech. Resume recommences speech output from the point at which the pause took effect. This is analogous to pause and resume on a tape player or CD player.
Chapter 5 describes an additional state system of synthesizers. An
ALLOCATEDSynthesizerhas sub-states forQUEUE_EMPTYandQUEUE_NOT_EMPTY. This represents whether there is text on the speech output queue of the synthesizer that is being spoken or waiting to be spoken. The queue state and pause/resume state are independent. It is possible, for example, for aRESUMEDsynthesizer to have an empty output queue (QUEUE_EMPTYstate). In this case, the synthesizer is silent because it has nothing to say. If any text is provided to be spoken, speech output will start immediately because the synthesizer isRESUMED.4.4.7 Recognizer Pause
For a recognizer, pausing and resuming turns audio input off and on and is analogous to switching the microphone off and on. When audio input is off the audio is lost. Unlike a synthesizer, for which a
resumecontinues speech output from the point at which it was paused, resuming a recognizer restarts the processing of audio input from the time at which resume is called.Under normal circumstances, pausing a recognizer will stop the recognizer's internal processes that match audio against grammars. If the user was in the middle of speaking at the instant at which the recognizer was paused, the recognizer is forced to finalize its recognition process. This is because a recognizer cannot assume that the audio received just before pausing is in any way linked to the audio data that it will receive after being resumed. Technically speaking, pausing introduces a discontinuity into the audio input stream.
One complexity for pausing and resuming a recognizer (not relevant to synthesizers) is the role of internal buffering. For various reasons, described in Chapter 6, a recognizer has a buffer for audio input which mediates between the audio device and the internal component of the recognizer which perform that match of the audio to the grammars. If recognizer is performing in real-time the buffer is empty or nearly empty. If the recognizer is temporarily suspended or operates slower than real-time, then the buffer may contain seconds of audio or more.
When a recognizer is paused, the pause takes effect on the input end of the buffer; i.e, the recognizer stops putting data into the buffer. At the other end of the buffer - where the actual recognition is performed Þ- the recognizer continues to process audio data until the buffer is empty. This means that the recognizer can continue to produce recognition results for a limited period of time even after it has been paused. (A
Recognizeralso provides aforceFinalizemethod with an option to flush the audio input buffer.)Chapter 6 describes an additional state system of recognizers. An
ALLOCATEDRecognizerhas a separate sub-state system forLISTENING,RECOGNIZINGandSUSPENDED. These states indicate the current activity of the internal recognition process. These states are largely decoupled from thePAUSEDandRESUMEDstates except that, as described in detail in Chapter 6, a paused recognizer eventually returns to theLISTENINGstate when it runs out of audio input (theLISTENINGstate indicates that the recognizer is listening to background silence, not to speech).The
SUSPENDEDstate of aRecognizeris superficially similar to thePAUSEDstate. In theSUSPENDEDstate the recognizer is not processing audio input from the buffer, but is temporarily halted while an application updates its grammars. A key distinction between thePAUSEDstate and theSUSPENDEDstate is that in theSUSPENDEDstate audio input can be still be coming into the audio input buffer. When the recognizer leaves theSUSPENDEDstate the audio is processed. TheSUSPENDEDstate allows a user to continue talking to the recognizer even while the recognizer is temporarilySUSPENDED. Furthermore, by updating grammars in theSUSPENDEDstate, an application can apply multiple grammar changes instantaneously with respect to the audio input stream.
4.5 Speech Events
Speech engines, both recognizers and synthesizers, generate many types of events. Applications are not required to handle all events, however, some events are particularly important for implementing speech applications. For example, some result events must be processed to receive recognized text from a recognizer.
Java Speech API events follow the JavaBeans event model. Events are issued to a listener attached to an object involved in generating that event. All the speech events are derived from the
SpeechEventclass in thejavax.speechpackage.The events of the
javax.speechpackage are listed in Table 4-4.
The events of the
javax.speech.synthesispackage are listed in Table 4-5.
The events of the
javax.speech.recognitionpackage are listed in Table 4-6.
4.5.1 Event Synchronization
A speech engine is required to provide all its events in synchronization with the AWT event queue whenever possible. The reason for this constraint is that it simplifies to integration of speech events with AWT events and the Java Foundation Classes events (e.g., keyboard, mouse and focus events). This constraint does not adversely affect applications that do not provide graphical interfaces.
Synchronization with the AWT event queue means that the AWT event queue is not issuing another event when the speech event is being issued. To implement this, speech engines need to place speech events onto the AWT event queue. The queue is obtained through the AWT
Toolkit:
EventQueue q = Toolkit.getDefaultToolkit().getSystemEventQueue();The
EventQueueruns a separate thread for event dispatch. Speech engines are not required to issue the events through that thread, but should ensure that thread is blocked while the speech event is issued.Note that
SpeechEventis not a sub-class ofAWTEvent, and that speech events are not actually placed directly on the AWT event queue. Instead, a speech engine is performing internal activities to keep its internal speech event queue synchronized with the AWT event queue to make an application developer's life easier.
4.6 Other Engine Functions
4.6.1 Runtime Engine Properties
Speech engines each have a set of properties that can be changed while the engine is running. The
EnginePropertiesinterface defined in thejavax.speechpackage is the root interface for accessing runtime properties. It is extended by theSynthesizerPropertiesinterface defined in thejavax.speech.synthesispackage, and theRecognizerPropertiesinterface defined in thejavax.speech.recognitionpackage.For any engine, the
EnginePropertiesis obtained by calling theEnginePropertiesmethod defined in theEngineinterface. To avoid casting the return object, thegetSynthesizerPropertiesmethod of theSynthesizerinterface and thegetRecognizerPropertiesmethod of theRecognizerinterface are also provided to return the appropriate type. For example:
{ Recognizer rec = ...; RecognizerProperties props = rec.getRecognizerProperties(); }The
EnginePropertiesinterface provides three types of functionality.
- The
addPropertyChangeListenerandremovePropertyChangeListenermethods add or remove a JavaBeansPropertyChangeListener. The listener receives an event notification any time a property value changes.
- The
getControlComponentmethod returns an engine-provided AWTComponentornullif one is not provided by the engine. This component can be displayed for a user to modify the engine properties. In some cases this component may allow customization of properties that are not programmatically accessible.The
SynthesizerPropertiesandRecognizerPropertiesinterfaces define the sets of runtime features of those engine types. These specific properties defined by these interfaces are described in Chapter 5 and Chapter 6 respectively.For each property there is a get and a set method, both using the JavaBeans property patterns. For example, the methods for handling a synthesizer's speaking voice are:
float getVolume()
void setVolume(float voice) throws PropertyVetoException;The get method returns the current setting. The set method attempts to set a new volume. A set method throws an exception if it fails. Typically, this is because the engine rejects the set value. In the case of volume, the legal range is 0.0 to 1.0. Values outside of this range cause an exception.
The set methods of the
SynthesizerPropertiesandRecognizerPropertiesinterfaces are asynchronous - they may return before the property change takes effect. For example, a change in the voice of a synthesizer may be deferred until the end of the current word, the current sentence or even the current document. So that an application knows when a change occurs, aPropertyChangeEventis issued to eachPropertyChangeListenerattached to the properties object.A property change event may also be issued because another application has changed a property, because changing one property affects another (e.g., changing a synthesizer's voice from male to female will usually cause an increase in the pitch setting), or because the property values have been reset.
4.6.2 Audio Management
The
AudioManagerof a speech engine is provided for management of the engine's speech input or output. For the Java Speech API Version 1.0 specification, theAudioManagerinterface is minimal. As the audio streaming interfaces for the Java platform are established, theAudioManagerinterface will be enhanced for more advanced functionality.For this release, the
AudioManagerinterface defines the ability to attach and removeAudioListenerobjects. For this release, theAudioListenerinterface is simple: it is empty. However, theRecognizerAudioListenerinterface extends theAudioListenerinterface to receive three audio event types (SPEECH_STARTED,SPEECH_STOPPEDandAUDIO_LEVELevents). These events are described in detail in Chapter 6. As a type ofAudioListener, aRecognizerAudioListeneris attached and removed through theAudioManager.4.6.3 Vocabulary Management
An engine can optionally provide a
VocabManagerfor control of the pronunciation of words and other vocabulary. This manager is obtained by calling thegetVocabManagermethod of aRecognizerorSynthesizer(it is a method of theEngineinterface). If the engine does not support vocabulary management, the method returnsnull.The manager defines a list of
Wordobjects. Words can be added to theVocabManager, removed from theVocabManager, and searched through theVocabManager.The
Wordclass is defined in thejavax.speechpackage. EachWordis defined by the following features.
- Spoken form: an optional
Stringthat indicates how theWordis spoken. For English, the spoken form might be used for defining how acronyms are spoken. For Japanese, the spoken form could provide a kana representation of how kanji in the written form is pronounced.
- Pronunciations: an optional
Stringarray containing one or more phonemic representations of the pronunciations of theWord. The International Phonetic Alphabet subset of Unicode is used throughout the Java Speech API for representing pronunciations.
- Grammatical categories: an optional set of or'ed grammatical categories. The
Wordclass defines 16 different classes of words (noun, verb, conjunction etc.). These classes do not represent a complete linguistic breakdown of all languages. Instead they are intended to provide aRecognizerorSynthesizerwith additional information about a word that may assist in correctly recognizing or correctly speaking it.
Contents
|
Previous
|
Next
|
JavaTM
Speech API Programmer's Guide
Copyright © 1997-1998
Sun Microsystems, Inc.
All rights reserved
Send comments or corrections to javaspeech-comments@sun.com