Overview (JSAPI 2.0, Final Release v2.0.6)

Overview

Package

Class

Use

Tree

Deprecated

Index

Help

JSAPI 2.0

PREV NEXT

FRAMES NO FRAMES

Java Speech API Specification, 2.0.

See:
Description

Java Speech Packages
javax.speech	Provides interfaces and classes for connecting to speech `Engine`s.
javax.speech.recognition	Provides interfaces and classes for using speech `Recognizer`s.
javax.speech.spi	Provides interfaces for implementing speech `Engine`s.
javax.speech.synthesis	Provides interfaces and classes for using speech `Synthesizer`s.

Java Speech API Specification, 2.0.

The Java Speech API (JSAPI 2.0) allows developers to easily incorporate speech technology into their Java programming language MIDlets (or applets) and applications. This API specifies a cross-platform interface to support speech recognizers and speech synthesizers, with considerations for future incorporation of dictation and other speech technologies.

This compact and efficient API supports both speech recognition and speech synthesis technologies either separately or together. Speech Engines may range in size depending on supported capabilities and application needs. Recognition Engines may provide full support for application-defined grammars or provide more limited support through specialized built-in grammars. Synthesis Engines may support full text-to-speech capabilities or simple text and audio sequencing.

JSAPI 2.0 is primarily aimed at the Java ME platform (specifically CLDC 1.0 and MIDP 1.0). At the same time, it works well on Java SE by avoiding platform specific APIs.

JSAPI 2.0 applies to many applications. It is designed to support not only the limited cases of name dialing and digit dialing, but the following applications as well (to name a few):

Games
Learning
Data entry
Application switching
E-mail interaction including both reading and manipulation
Calendar management and interaction
Accessibility, including screen reading
System alerts including low memory or low fuel
Car navigation system

JSAPI 2.0 is especially well suited for downloadable applications, opening rich new interaction possibilities on devices with limited size yet unlimited potential. Capabilities for applications include the ability to

easily switch context between applications,
prioritize applications, especially between trusted and untrusted applications, and
support multimodal interaction.

JSAPI 2.0 is designed for both the average Java programmer as well as people with speech expertise who need more detailed control. To get started, please see the "Hello World" recognition example and synthesis example . The Table of Examples provides additional examples. The Table of Diagrams provides links to helpful diagrams. The FAQ below provides additional information.

Implementations

JSAPI 2.0 accommodates a wide range of devices and and speech Engines. The API supports mechanisms to request a type of Engine and particular features of that Engine. An application may first request a Recognizer or Synthesizer by using the createEngine method. Applications may select Engines that support letter to sound rules (see getSupportsLetterToSound). They may also select Engines that support markup for either grammar or synthesis text specifications (see getSupportsMarkup). Finally, different recognition Engines support various sizes of vocabulary (see getVocabSupport).

A complete implementation will support both recognition and synthesis with letter to sound rules and markup support. A Recognizer with a MEDIUM_SIZE vocabulary or greater should be sufficient for most applications.

In some cases, lack of support for a feature affects platform independence for an application. If an Engine does not support letter to sound rules, the application may require knowledge about vocabulary limitations. If a Recognizer does not support markup, the application may require knowledge about specific built-in grammars. Markup support is also required for programmatic definition of grammars. If a Synthesizer does not support markup, synthesis requests will be rendered as simple text strings or audio. The vocabulary support size may vary from device to device even with the same Engine due to device constraints.

In all cases, a JSAPI 2.0 engine must pass the TCK tests appropriate for the features supported.

The following table summarizes those classes and methods affected by supported features:

Feature Source Conditions

getSupportsMarkup GrammarManager.createRuleGrammar

getSupportsMarkup GrammarManager.loadGrammar "application/srgs+xml"

getSupportsMarkup GrammarManager.loadGrammar "application/srgs+xml"

getSupportsMarkup GrammarManager.loadGrammar "application/srgs+xml"

getSupportsMarkup GrammarManager.loadGrammar "application/srgs+xml"

getSupportsMarkup GrammarManager.loadGrammar "application/srgs+xml"

getSupportsLetterToSound Synthesizer.speak vocabulary not present in Engine

getSupportsLetterToSound Synthesizer.speak vocabulary not present in Engine

getSupportsLetterToSound Synthesizer.speakMarkup vocabulary not present in Engine

getSupportsLetterToSound GrammarManager.loadGrammar vocabulary not present in Engine

getSupportsLetterToSound GrammarManager.loadGrammar vocabulary not present in Engine

getSupportsLetterToSound GrammarManager.loadGrammar vocabulary not present in Engine

getSupportsLetterToSound GrammarManager.loadGrammar vocabulary not present in Engine

getSupportsLetterToSound GrammarManager.loadGrammar vocabulary not present in Engine

getSupportsLetterToSound Recognizer.resume vocabulary not present in Engine

Feature	Source	Conditions
getSupportsMarkup	GrammarManager.createRuleGrammar
getSupportsMarkup	GrammarManager.loadGrammar	`"application/srgs+xml"`
getSupportsMarkup	GrammarManager.loadGrammar	`"application/srgs+xml"`
getSupportsMarkup	GrammarManager.loadGrammar	`"application/srgs+xml"`
getSupportsMarkup	GrammarManager.loadGrammar	`"application/srgs+xml"`
getSupportsMarkup	GrammarManager.loadGrammar	`"application/srgs+xml"`
getSupportsLetterToSound	Synthesizer.speak	vocabulary not present in `Engine`
getSupportsLetterToSound	Synthesizer.speak	vocabulary not present in `Engine`
getSupportsLetterToSound	Synthesizer.speakMarkup	vocabulary not present in `Engine`
getSupportsLetterToSound	GrammarManager.loadGrammar	vocabulary not present in `Engine`
getSupportsLetterToSound	GrammarManager.loadGrammar	vocabulary not present in `Engine`
getSupportsLetterToSound	GrammarManager.loadGrammar	vocabulary not present in `Engine`
getSupportsLetterToSound	GrammarManager.loadGrammar	vocabulary not present in `Engine`
getSupportsLetterToSound	GrammarManager.loadGrammar	vocabulary not present in `Engine`
getSupportsLetterToSound	Recognizer.resume	vocabulary not present in `Engine`

Security Considerations

The Java Speech API 2.0 Security addendum defines the security requirements for the Java Speech API 2.0.

JSAPI 2.0 also establishes interaction levels between trusted and untrusted Synthesizers with System properties. If more than one Synthesizer instance is active, then interaction between them may be affected by the setInterruptibility method. The following system properties establish defaults and limits for this interaction:

javax.speech.synthesizer.defaultTrustedInterruptibility
javax.speech.synthesizer.defaultUntrustedInterruptibility
javax.speech.synthesizer.maximumUntrustedInterruptibility

FAQ

Question: What is the target platform for this API?
Answer: Java ME - specifically CLDC 1.0 and MIDP 1.0. However, this API works equally well with Java SE.

Question: What is the footprint for a JSAPI 2.0 implementation?
Answer: Implementations can require 0.5-1.5 MBytes of ROM for models and algorithms and approximately 128 KBytes of RAM depending on vocabulary and grammar size. The footprint can also depend on the underlying technology.

Question: How do I get started?
Answer: Please see the "Hello World" recognition example and synthesis example . Please see the Table of Examples for more examples.

Question: Can I mix speech with other modalities?
Answer: Yes, developers may easily implement multimodal interaction in their applications. Events in the API support this interaction and may be synchronized with other event queues.

Question: Can I dictate to my device?
Answer: The API can support dictation, but implementations typically require from 10s to 100s of MBytes of memory. However, as footprints shrink and device capabilities grow, this may become viable in the near future.

Question: Why stick with integers since CLDC 1.1 is out?
Answer: Experience tells us that devices with fixed-point only will continue for some time into the future. Floating point requires additional power and chip size - things to avoid on embedded devices.

JSAPI 2.0 Revision History

Date Description Version

May-2004 Community Review 0.8.0.61

March-2005 Public Review 0.9.0.3

February-2007 Proposed Final Draft 0.9.1.1

June-2007 Proposed Final Draft 2 0.9.2.1

February-2009 Final Release 2.0.6.0

Date	Description	Version
May-2004	Community Review	0.8.0.61
March-2005	Public Review	0.9.0.3
February-2007	Proposed Final Draft	0.9.1.1
June-2007	Proposed Final Draft 2	0.9.2.1
February-2009	Final Release	2.0.6.0

Major Changes from JSAPI 1

Compatibility with JSAPI 1.0 maintained where possible and desirable.
Primarily aimed at Java ME, runs on Java SE as well.
Designed for CLDC 1.0 and MIDP 1.0 and above.
All use of floating point removed.
No dependence on AWT.
Integration with event queues via SpeechEventExecutor.
Uses getResourceAsStream to find Engines.
Uses of URL class replaced by String.
Need for StringReader eliminated.
Explicit dictation support removed, but can be added in the future.
Supports W3C SRGS recognition markup rather than JSGF.
Supports W3C SSML synthesis markup rather than JSML.
Markup-specific prefixes/suffixes removed from Class/Interface/Method names.
References to "vendor" removed from Class/Interface/Method names.
Event masks added to filter events.
Includes a full-fledged AudioManager compatible with JSR-135.
AudioSegment class introduced to contain audio data.
The Word class now supports specification of audio for use with recognition and synthesis.
Clearer delineation between engine implementations and instances.
Support for priorities.
Support for trusted vs. untrusted applications.
More complete support for "built-in" grammars.
Recognition result confidence added.
Support for Synthesizer focus

Related Literature

Java Speech API (1.0)
Connected, Limited Device Configuration (JSR-30)
Mobile Information Device Profile (JSR-37)
Mobile Information Device Profile 2.0 (JSR-118)
Mobile Media API (JSR-135)
Speech Recognition Grammar Specification Version 1.0
Speech Synthesis Markup Language Version 1.0
International Phonetic Alphabet (IPA)

Acknowledgments

The following companies and individuals (in alphabetical order) have provided significant contributions to this specification. Their efforts are much appreciated.

Conversay
IBM
Intel Corporation
Motorola, Inc.
Nokia Corporation
Sun Microsystems, Inc.
Texas Instruments Incorporated
Thompson, Andrew

Since:: CLDC 1.0, MIDP 1.0, J2SE 1.3.1
Version:: JSAPI 2.0