JSAPI 2.0

Java Speech API Specification, 2.0.

See:
          Description

Java Speech Packages
javax.speech Provides interfaces and classes for connecting to speech Engines.
javax.speech.recognition Provides interfaces and classes for using speech Recognizers.
javax.speech.spi Provides interfaces for implementing speech Engines.
javax.speech.synthesis Provides interfaces and classes for using speech Synthesizers.

 

Java Speech API Specification, 2.0.

The Java Speech API (JSAPI 2.0) allows developers to easily incorporate speech technology into their Java programming language MIDlets (or applets) and applications. This API specifies a cross-platform interface to support speech recognizers and speech synthesizers, with considerations for future incorporation of dictation and other speech technologies.

This compact and efficient API supports both speech recognition and speech synthesis technologies either separately or together. Speech Engines may range in size depending on supported capabilities and application needs. Recognition Engines may provide full support for application-defined grammars or provide more limited support through specialized built-in grammars. Synthesis Engines may support full text-to-speech capabilities or simple text and audio sequencing.

JSAPI 2.0 is primarily aimed at the Java ME platform (specifically CLDC 1.0 and MIDP 1.0). At the same time, it works well on Java SE by avoiding platform specific APIs.

JSAPI 2.0 applies to many applications. It is designed to support not only the limited cases of name dialing and digit dialing, but the following applications as well (to name a few):

JSAPI 2.0 is especially well suited for downloadable applications, opening rich new interaction possibilities on devices with limited size yet unlimited potential. Capabilities for applications include the ability to

JSAPI 2.0 is designed for both the average Java programmer as well as people with speech expertise who need more detailed control. To get started, please see the "Hello World" recognition example and synthesis example . The Table of Examples provides additional examples. The Table of Diagrams provides links to helpful diagrams. The FAQ below provides additional information.

Implementations

JSAPI 2.0 accommodates a wide range of devices and and speech Engines. The API supports mechanisms to request a type of Engine and particular features of that Engine. An application may first request a Recognizer or Synthesizer by using the createEngine method. Applications may select Engines that support letter to sound rules (see getSupportsLetterToSound). They may also select Engines that support markup for either grammar or synthesis text specifications (see getSupportsMarkup). Finally, different recognition Engines support various sizes of vocabulary (see getVocabSupport).

A complete implementation will support both recognition and synthesis with letter to sound rules and markup support. A Recognizer with a MEDIUM_SIZE vocabulary or greater should be sufficient for most applications.

In some cases, lack of support for a feature affects platform independence for an application. If an Engine does not support letter to sound rules, the application may require knowledge about vocabulary limitations. If a Recognizer does not support markup, the application may require knowledge about specific built-in grammars. Markup support is also required for programmatic definition of grammars. If a Synthesizer does not support markup, synthesis requests will be rendered as simple text strings or audio. The vocabulary support size may vary from device to device even with the same Engine due to device constraints.

In all cases, a JSAPI 2.0 engine must pass the TCK tests appropriate for the features supported.

The following table summarizes those classes and methods affected by supported features:

FeatureSourceConditions
getSupportsMarkup GrammarManager.createRuleGrammar  
getSupportsMarkup GrammarManager.loadGrammar "application/srgs+xml"
getSupportsMarkup GrammarManager.loadGrammar "application/srgs+xml"
getSupportsMarkup GrammarManager.loadGrammar "application/srgs+xml"
getSupportsMarkup GrammarManager.loadGrammar "application/srgs+xml"
getSupportsMarkup GrammarManager.loadGrammar "application/srgs+xml"
getSupportsLetterToSound Synthesizer.speak vocabulary not present in Engine
getSupportsLetterToSound Synthesizer.speak vocabulary not present in Engine
getSupportsLetterToSound Synthesizer.speakMarkup vocabulary not present in Engine
getSupportsLetterToSound GrammarManager.loadGrammar vocabulary not present in Engine
getSupportsLetterToSound GrammarManager.loadGrammar vocabulary not present in Engine
getSupportsLetterToSound GrammarManager.loadGrammar vocabulary not present in Engine
getSupportsLetterToSound GrammarManager.loadGrammar vocabulary not present in Engine
getSupportsLetterToSound GrammarManager.loadGrammar vocabulary not present in Engine
getSupportsLetterToSound Recognizer.resume vocabulary not present in Engine

Security Considerations

The Java Speech API 2.0 Security addendum defines the security requirements for the Java Speech API 2.0.

JSAPI 2.0 also establishes interaction levels between trusted and untrusted Synthesizers with System properties. If more than one Synthesizer instance is active, then interaction between them may be affected by the setInterruptibility method. The following system properties establish defaults and limits for this interaction:

FAQ

Question: What is the target platform for this API?
Answer: Java ME - specifically CLDC 1.0 and MIDP 1.0. However, this API works equally well with Java SE.

Question: What is the footprint for a JSAPI 2.0 implementation?
Answer: Implementations can require 0.5-1.5 MBytes of ROM for models and algorithms and approximately 128 KBytes of RAM depending on vocabulary and grammar size. The footprint can also depend on the underlying technology.

Question: How do I get started?
Answer: Please see the "Hello World" recognition example and synthesis example . Please see the Table of Examples for more examples.

Question: Can I mix speech with other modalities?
Answer: Yes, developers may easily implement multimodal interaction in their applications. Events in the API support this interaction and may be synchronized with other event queues.

Question: Can I dictate to my device?
Answer: The API can support dictation, but implementations typically require from 10s to 100s of MBytes of memory. However, as footprints shrink and device capabilities grow, this may become viable in the near future.

Question: Why stick with integers since CLDC 1.1 is out?
Answer: Experience tells us that devices with fixed-point only will continue for some time into the future. Floating point requires additional power and chip size - things to avoid on embedded devices.

JSAPI 2.0 Revision History

Date Description Version
May-2004 Community Review 0.8.0.61
March-2005 Public Review 0.9.0.3
February-2007 Proposed Final Draft 0.9.1.1
June-2007 Proposed Final Draft 2 0.9.2.1
February-2009 Final Release 2.0.6.0

Major Changes from JSAPI 1

Related Literature

Java Speech API (1.0)
Connected, Limited Device Configuration (JSR-30)
Mobile Information Device Profile (JSR-37)
Mobile Information Device Profile 2.0 (JSR-118)
Mobile Media API (JSR-135)
Speech Recognition Grammar Specification Version 1.0
Speech Synthesis Markup Language Version 1.0
International Phonetic Alphabet (IPA)

Acknowledgments

The following companies and individuals (in alphabetical order) have provided significant contributions to this specification. Their efforts are much appreciated.

Since:
CLDC 1.0, MIDP 1.0, J2SE 1.3.1
Version:
JSAPI 2.0

JSAPI 2.0

JavaTM Speech API 2.0, Final Release v2.0.6.
© 2008, Conversay and Sun Microsystems.

Free Web Hosting