|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Object com.sun.speech.freetts.lexicon.LexiconImpl
Provides an implementation of a Lexicon.
This implementation will either read from a straight ASCII file or a binary file. When reading from an ASCII file, you can specify when the input line is tokenized: load, lookup, or never. If you specify 'load', the entire file will be parsed when it is loaded. If you specify 'lookup', the file will be loaded, but the parsing for each line will be delayed until it is referenced and the parsed form will be saved away. If you specify 'never', the lines will parsed each time they are referenced. The default is 'never'. To specify the load type, set the system property as follows:
-Dcom.sun.speech.freetts.lexicon.LexTokenize=load
If a binary file is used, you can also specify whether the new IO package is used. The new IO package is new for JDK1.4, and can greatly improve the speed of loading files. To enable new IO, use the following system property (it is enabled by default):
-Dcom.sun.speech.freetts.useNewIO=true
The implementation also allows users to define their own addenda that will be used in addition to the system addenda. If the user defines their own addenda, it values will be added to the system addenda, overriding any existing elements in the system addenda. To define a user addenda, the user needs to set the following property:
-Dcom.sun.speeech.freetts.lexicon.userAddenda=<URLToUserAddenda>Where <URLToUserAddenda> is a URL pointing to an ASCII file containing addenda entries.
[[[TODO: support multiple homographs with the same part of speech.]]]
Field Summary | |
protected boolean |
tokenizeOnLoad
If true, the phone string is replaced with the phone array in the hashmap when the phone array is loaded. |
protected boolean |
tokenizeOnLookup
If true, the phone string is replaced with the phone array in the hashmap when the phone array is first looked up. |
Constructor Summary | |
LexiconImpl()
Class constructor for an empty Lexicon. |
|
LexiconImpl(java.net.URL compiledURL,
java.net.URL addendaURL,
java.net.URL letterToSoundURL,
boolean binary)
Create a new LexiconImpl by reading from the given URLS. |
Method Summary | |
void |
addAddendum(java.lang.String word,
java.lang.String partOfSpeech,
java.lang.String[] phones)
Adds a word to the addenda. |
boolean |
compare(LexiconImpl other)
Tests to see if this lexicon is identical to the other for debugging purposes. |
protected java.util.Map |
createLexicon(java.io.InputStream is,
boolean binary,
int estimatedSize)
Reads the given input stream as lexicon data and returns the results in a Map . |
void |
dumpBinary(java.lang.String path)
Dumps this lexicon (just the compiled form). |
protected static java.lang.String |
fixPartOfSpeech(java.lang.String partOfSpeech)
Fixes the part of speech if it is null . |
protected java.lang.String[] |
getPhones(java.util.Map lexicon,
java.lang.String wordAndPartOfSpeech)
Gets a phone list for a word from a given lexicon. |
protected java.lang.String[] |
getPhones(java.util.Map lexicon,
java.lang.String word,
java.lang.String partOfSpeech)
Gets a phone list for a word from a given lexicon. |
protected java.lang.String[] |
getPhones(java.lang.String phones)
Turns the phone String into a String[] ,
using " " as the delimiter. |
java.lang.String[] |
getPhones(java.lang.String word,
java.lang.String partOfSpeech)
Gets the phone list for a given word. |
java.lang.String[] |
getPhones(java.lang.String word,
java.lang.String partOfSpeech,
boolean useLTS)
Gets the phone list for a given word. |
boolean |
isLoaded()
Determines if this lexicon is loaded. |
void |
load()
Loads the data for this lexicon. |
protected java.util.Map |
loadTextLexicon(java.io.InputStream is,
int estimatedSize)
Reads the given input stream as text lexicon data and returns the results in a Map . |
protected void |
parseAndAdd(java.util.Map lexicon,
java.lang.String line)
Creates a word from the given input line and add it to the lexicon. |
void |
removeAddendum(java.lang.String word,
java.lang.String partOfSpeech)
Removes a word from the addenda. |
protected void |
setLexiconParameters(java.net.URL compiledURL,
java.net.URL addendaURL,
java.net.URL letterToSoundURL,
boolean binary)
Sets the lexicon parameters |
Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Methods inherited from interface com.sun.speech.freetts.lexicon.Lexicon |
isSyllableBoundary |
Field Detail |
protected boolean tokenizeOnLoad
protected boolean tokenizeOnLookup
Constructor Detail |
public LexiconImpl(java.net.URL compiledURL, java.net.URL addendaURL, java.net.URL letterToSoundURL, boolean binary)
compiledURL
- a URL pointing to the compiled lexiconaddendaURL
- a URL pointing to lexicon addendaletterToSoundURL
- a LetterToSound to use if a word cannot
be found in the compiled form or the addendabinary
- if true
, the input streams are binary;
otherwise, they are text.public LexiconImpl()
Method Detail |
protected void setLexiconParameters(java.net.URL compiledURL, java.net.URL addendaURL, java.net.URL letterToSoundURL, boolean binary)
compiledURL
- a URL pointing to the compiled lexiconaddendaURL
- a URL pointing to lexicon addendaletterToSoundURL
- a URL pointing to the LetterToSound to usebinary
- if true
, the input streams are binary;
otherwise, they are text.public boolean isLoaded()
isLoaded
in interface Lexicon
true
if the lexicon is loadedpublic void load() throws java.io.IOException
load
in interface Lexicon
java.io.IOException
- if errors occur during loadingprotected java.util.Map createLexicon(java.io.InputStream is, boolean binary, int estimatedSize) throws java.io.IOException
Map
.
is
- the input streambinary
- if true
, the data is binaryestimatedSize
- the estimated size of the lexicon
java.io.IOException
- if errors are encountered while reading the dataprotected java.util.Map loadTextLexicon(java.io.InputStream is, int estimatedSize) throws java.io.IOException
Map
.
is
- the input streamestimatedSize
- the estimated number of entries of the lexicon
java.io.IOException
- if errors are encountered while reading the dataprotected void parseAndAdd(java.util.Map lexicon, java.lang.String line)
lexicon
- the lexiconline
- the input textpublic java.lang.String[] getPhones(java.lang.String word, java.lang.String partOfSpeech)
null
. The format is lexicon
dependent. If the part of speech does not matter, pass in
null
.
getPhones
in interface Lexicon
word
- the word to findpartOfSpeech
- the part of speech
null
public java.lang.String[] getPhones(java.lang.String word, java.lang.String partOfSpeech, boolean useLTS)
null
is returned. The
partOfSpeech
is implementation dependent, but
null
always matches.
getPhones
in interface Lexicon
word
- the word to findpartOfSpeech
- the part of speech or null
useLTS
- whether to use the letter-to-sound rules when
the word is not in the lexicon.
protected java.lang.String[] getPhones(java.util.Map lexicon, java.lang.String word, java.lang.String partOfSpeech)
null
. The format is
lexicon dependent. If the part of speech does not matter, pass
in null
.
lexicon
- the lexiconword
- the word to findpartOfSpeech
- the part of speech
null
protected java.lang.String[] getPhones(java.util.Map lexicon, java.lang.String wordAndPartOfSpeech)
null
.
lexicon
- the lexiconwordAndPartOfSpeech
- word and part of speech concatenated
together
null
protected java.lang.String[] getPhones(java.lang.String phones)
String
into a String[]
,
using " " as the delimiter.
phones
- the phones
public void addAddendum(java.lang.String word, java.lang.String partOfSpeech, java.lang.String[] phones)
addAddendum
in interface Lexicon
word
- the word to findpartOfSpeech
- the part of speechphones
- the phones for the wordpublic void removeAddendum(java.lang.String word, java.lang.String partOfSpeech)
removeAddendum
in interface Lexicon
word
- the word to removepartOfSpeech
- the part of speechpublic void dumpBinary(java.lang.String path)
path
- the root path to dump it topublic boolean compare(LexiconImpl other)
other
- the other lexicon to compare to
protected static java.lang.String fixPartOfSpeech(java.lang.String partOfSpeech)
null
. The
default representation of a null
part of speech
is the number "0".
|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |