|
JSAPI 2.0 | |||||||||
PREV NEXT | FRAMES NO FRAMES |
See:
Description
Java Speech Packages | |
---|---|
javax.speech | Provides interfaces and classes for connecting to speech Engine s. |
javax.speech.recognition | Provides interfaces and classes for using speech Recognizer s. |
javax.speech.spi | Provides interfaces for implementing speech Engine s. |
javax.speech.synthesis | Provides interfaces and classes for using speech Synthesizer s. |
Java Speech API Specification, 2.0.
The Java Speech API (JSAPI 2.0) allows developers to easily incorporate speech technology into their Java programming language MIDlets (or applets) and applications. This API specifies a cross-platform interface to support speech recognizers and speech synthesizers, with considerations for future incorporation of dictation and other speech technologies.
This compact and efficient API supports both speech recognition and
speech synthesis technologies either separately or together. Speech
Engine
s may range in size depending on supported capabilities
and application needs. Recognition Engine
s may provide full
support for application-defined grammars or provide more limited
support through specialized built-in grammars. Synthesis Engine
s may support full text-to-speech capabilities or simple text
and audio sequencing.
JSAPI 2.0 is primarily aimed at the Java ME platform (specifically CLDC 1.0 and MIDP 1.0). At the same time, it works well on Java SE by avoiding platform specific APIs.
JSAPI 2.0 applies to many applications. It is designed to support not only the limited cases of name dialing and digit dialing, but the following applications as well (to name a few):
JSAPI 2.0 is designed for both the average Java programmer as well as people with speech expertise who need more detailed control. To get started, please see the "Hello World" recognition example and synthesis example . The Table of Examples provides additional examples. The Table of Diagrams provides links to helpful diagrams. The FAQ below provides additional information.
Engine
s. The API supports mechanisms to request a type of Engine
and particular features of that Engine
. An
application may first request a Recognizer
or Synthesizer
by using the
createEngine
method.
Applications may select Engine
s that support letter to sound rules
(see
getSupportsLetterToSound).
They may also select Engine
s that support markup for either grammar or
synthesis text specifications
(see
getSupportsMarkup).
Finally, different recognition Engine
s support various sizes of vocabulary
(see
getVocabSupport).
A complete implementation will support both recognition and synthesis with
letter to sound rules and markup support.
A Recognizer
with a MEDIUM_SIZE
vocabulary or greater should be sufficient for most applications.
In some cases, lack of support for a feature affects platform
independence for an application. If an Engine
does not
support letter to sound rules, the application may require
knowledge about vocabulary limitations. If a Recognizer
does not support markup, the application may require knowledge about
specific built-in grammars. Markup support is also required for
programmatic definition of grammars. If a Synthesizer
does
not support markup, synthesis requests will be rendered as simple text strings
or audio. The vocabulary support size may vary from device to device
even with the same Engine
due to device constraints.
In all cases, a JSAPI 2.0 engine must pass the TCK tests appropriate for the features supported.
The following table summarizes those classes and methods affected by supported features:
Feature Source Conditions getSupportsMarkup GrammarManager.createRuleGrammar getSupportsMarkup GrammarManager.loadGrammar "application/srgs+xml"
getSupportsMarkup GrammarManager.loadGrammar "application/srgs+xml"
getSupportsMarkup GrammarManager.loadGrammar "application/srgs+xml"
getSupportsMarkup GrammarManager.loadGrammar "application/srgs+xml"
getSupportsMarkup GrammarManager.loadGrammar "application/srgs+xml"
getSupportsLetterToSound Synthesizer.speak vocabulary not present in Engine
getSupportsLetterToSound Synthesizer.speak vocabulary not present in Engine
getSupportsLetterToSound Synthesizer.speakMarkup vocabulary not present in Engine
getSupportsLetterToSound GrammarManager.loadGrammar vocabulary not present in Engine
getSupportsLetterToSound GrammarManager.loadGrammar vocabulary not present in Engine
getSupportsLetterToSound GrammarManager.loadGrammar vocabulary not present in Engine
getSupportsLetterToSound GrammarManager.loadGrammar vocabulary not present in Engine
getSupportsLetterToSound GrammarManager.loadGrammar vocabulary not present in Engine
getSupportsLetterToSound Recognizer.resume vocabulary not present in Engine
JSAPI 2.0 also establishes interaction levels between trusted and
untrusted Synthesizer
s with System
properties. If
more than one Synthesizer
instance is active, then interaction
between them may be affected by the
setInterruptibility
method.
The following system properties establish defaults and limits
for this interaction:
javax.speech.synthesizer.defaultTrustedInterruptibility
javax.speech.synthesizer.defaultUntrustedInterruptibility
javax.speech.synthesizer.maximumUntrustedInterruptibility
Question: What is the footprint for a JSAPI 2.0 implementation?
Answer: Implementations can require 0.5-1.5 MBytes of ROM
for models and algorithms and approximately 128 KBytes of RAM
depending on vocabulary and grammar size. The footprint can also
depend on the underlying technology.
Question: How do I get started?
Answer: Please see the "Hello World"
recognition example
and
synthesis example
.
Please see the
Table of Examples
for more examples.
Question: Can I mix speech with other modalities?
Answer: Yes, developers may easily implement multimodal interaction in
their applications. Events in the API support this interaction and
may be synchronized with other event queues.
Question: Can I dictate to my device?
Answer: The API can support dictation, but implementations typically
require from 10s to 100s of MBytes of memory. However, as footprints
shrink and device capabilities grow, this may become viable in the near
future.
Question: Why stick with integers since CLDC 1.1 is out?
Answer: Experience tells us that devices with fixed-point only will
continue for some time into the future. Floating point requires
additional power and chip size - things to avoid on embedded
devices.
Date Description Version May-2004 Community Review 0.8.0.61 March-2005 Public Review 0.9.0.3 February-2007 Proposed Final Draft 0.9.1.1 June-2007 Proposed Final Draft 2 0.9.2.1 February-2009 Final Release 2.0.6.0
SpeechEventExecutor
.
getResourceAsStream
to find Engine
s.
String
.
StringReader
eliminated.
AudioManager
compatible with JSR-135.
AudioSegment
class introduced to contain audio data.
Word
class now supports specification of audio for use with recognition and synthesis.
Synthesizer
focus
|
JSAPI 2.0 | |||||||||
PREV NEXT | FRAMES NO FRAMES |
JavaTM Speech API 2.0,
Final Release v2.0.6.
© 2008, Conversay and Sun Microsystems.