LexiconImpl (FreeTTS 1.2)

Overview

Package

Class

Tree

Deprecated

Index

Help

PREV CLASS NEXT CLASS

FRAMES NO FRAMES

SUMMARY: NESTED | FIELD | CONSTR | METHOD

DETAIL: FIELD | CONSTR | METHOD

com.sun.speech.freetts.lexicon
Class LexiconImpl

java.lang.Object
  com.sun.speech.freetts.lexicon.LexiconImpl

All Implemented Interfaces:: Lexicon

Direct Known Subclasses:: CMULexicon

public abstract class LexiconImpl
extends java.lang.Object
implements Lexicon

Provides an implementation of a Lexicon.

This implementation will either read from a straight ASCII file or a binary file. When reading from an ASCII file, you can specify when the input line is tokenized: load, lookup, or never. If you specify 'load', the entire file will be parsed when it is loaded. If you specify 'lookup', the file will be loaded, but the parsing for each line will be delayed until it is referenced and the parsed form will be saved away. If you specify 'never', the lines will parsed each time they are referenced. The default is 'never'. To specify the load type, set the system property as follows:

   -Dcom.sun.speech.freetts.lexicon.LexTokenize=load

If a binary file is used, you can also specify whether the new IO package is used. The new IO package is new for JDK1.4, and can greatly improve the speed of loading files. To enable new IO, use the following system property (it is enabled by default):

   -Dcom.sun.speech.freetts.useNewIO=true

The implementation also allows users to define their own addenda that will be used in addition to the system addenda. If the user defines their own addenda, it values will be added to the system addenda, overriding any existing elements in the system addenda. To define a user addenda, the user needs to set the following property:

   -Dcom.sun.speeech.freetts.lexicon.userAddenda=<URLToUserAddenda>

Where <URLToUserAddenda> is a URL pointing to an ASCII file containing addenda entries.

[[[TODO: support multiple homographs with the same part of speech.]]]

Field Summary
`protected boolean`	`tokenizeOnLoad` If true, the phone string is replaced with the phone array in the hashmap when the phone array is loaded.
`protected boolean`	`tokenizeOnLookup` If true, the phone string is replaced with the phone array in the hashmap when the phone array is first looked up.

Constructor Summary
`LexiconImpl()` Class constructor for an empty Lexicon.
`LexiconImpl(java.net.URL compiledURL, java.net.URL addendaURL, java.net.URL letterToSoundURL, boolean binary)` Create a new LexiconImpl by reading from the given URLS.

Method Summary
`void`	`addAddendum(java.lang.String word, java.lang.String partOfSpeech, java.lang.String[] phones)` Adds a word to the addenda.
`boolean`	`compare(LexiconImpl other)` Tests to see if this lexicon is identical to the other for debugging purposes.
`protected java.util.Map`	`createLexicon(java.io.InputStream is, boolean binary, int estimatedSize)` Reads the given input stream as lexicon data and returns the results in a `Map`.
`void`	`dumpBinary(java.lang.String path)` Dumps this lexicon (just the compiled form).
`protected static java.lang.String`	`fixPartOfSpeech(java.lang.String partOfSpeech)` Fixes the part of speech if it is `null`.
`protected java.lang.String[]`	`getPhones(java.util.Map lexicon, java.lang.String wordAndPartOfSpeech)` Gets a phone list for a word from a given lexicon.
`protected java.lang.String[]`	`getPhones(java.util.Map lexicon, java.lang.String word, java.lang.String partOfSpeech)` Gets a phone list for a word from a given lexicon.
`protected java.lang.String[]`	`getPhones(java.lang.String phones)` Turns the phone `String` into a `String[]`, using " " as the delimiter.
`java.lang.String[]`	`getPhones(java.lang.String word, java.lang.String partOfSpeech)` Gets the phone list for a given word.
`java.lang.String[]`	`getPhones(java.lang.String word, java.lang.String partOfSpeech, boolean useLTS)` Gets the phone list for a given word.
`boolean`	`isLoaded()` Determines if this lexicon is loaded.
`void`	`load()` Loads the data for this lexicon.
`protected java.util.Map`	`loadTextLexicon(java.io.InputStream is, int estimatedSize)` Reads the given input stream as text lexicon data and returns the results in a `Map`.
`protected void`	`parseAndAdd(java.util.Map lexicon, java.lang.String line)` Creates a word from the given input line and add it to the lexicon.
`void`	`removeAddendum(java.lang.String word, java.lang.String partOfSpeech)` Removes a word from the addenda.
`protected void`	`setLexiconParameters(java.net.URL compiledURL, java.net.URL addendaURL, java.net.URL letterToSoundURL, boolean binary)` Sets the lexicon parameters

Methods inherited from class java.lang.Object

clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Methods inherited from interface com.sun.speech.freetts.lexicon.Lexicon

isSyllableBoundary

Field Detail

tokenizeOnLoad

protected boolean tokenizeOnLoad

If true, the phone string is replaced with the phone array in the hashmap when the phone array is loaded. The side effects of this are quicker lookups, but more memory usage and a longer startup time.

tokenizeOnLookup

protected boolean tokenizeOnLookup

If true, the phone string is replaced with the phone array in the hashmap when the phone array is first looked up. The side effects Set by cmufilelex.tokenize=lookup.

Constructor Detail

LexiconImpl

public LexiconImpl(java.net.URL compiledURL,
                   java.net.URL addendaURL,
                   java.net.URL letterToSoundURL,
                   boolean binary)

Create a new LexiconImpl by reading from the given URLS.
Parameters:: compiledURL - a URL pointing to the compiled lexicon; addendaURL - a URL pointing to lexicon addenda; letterToSoundURL - a LetterToSound to use if a word cannot be found in the compiled form or the addenda; binary - if true, the input streams are binary; otherwise, they are text.

LexiconImpl

public LexiconImpl()

Class constructor for an empty Lexicon.

Method Detail

setLexiconParameters

protected void setLexiconParameters(java.net.URL compiledURL,
                                    java.net.URL addendaURL,
                                    java.net.URL letterToSoundURL,
                                    boolean binary)

Sets the lexicon parameters

Parameters:: compiledURL - a URL pointing to the compiled lexicon; addendaURL - a URL pointing to lexicon addenda; letterToSoundURL - a URL pointing to the LetterToSound to use; binary - if true, the input streams are binary; otherwise, they are text.

isLoaded

public boolean isLoaded()

Determines if this lexicon is loaded.

Specified by:: isLoaded in interface Lexicon

Returns:: true if the lexicon is loaded

load

public void load()
          throws java.io.IOException

Loads the data for this lexicon. If the

Specified by:: load in interface Lexicon

Throws:: java.io.IOException - if errors occur during loading

createLexicon

protected java.util.Map createLexicon(java.io.InputStream is,
                                      boolean binary,
                                      int estimatedSize)
                               throws java.io.IOException

Reads the given input stream as lexicon data and returns the results in a Map.

Parameters:: is - the input stream; binary - if true, the data is binary; estimatedSize - the estimated size of the lexicon
Throws:: java.io.IOException - if errors are encountered while reading the data

loadTextLexicon

protected java.util.Map loadTextLexicon(java.io.InputStream is,
                                        int estimatedSize)
                                 throws java.io.IOException

Reads the given input stream as text lexicon data and returns the results in a Map.

Parameters:: is - the input stream; estimatedSize - the estimated number of entries of the lexicon
Throws:: java.io.IOException - if errors are encountered while reading the data

parseAndAdd

protected void parseAndAdd(java.util.Map lexicon,
                           java.lang.String line)

Creates a word from the given input line and add it to the lexicon.

Parameters:: lexicon - the lexicon; line - the input text

getPhones

public java.lang.String[] getPhones(java.lang.String word,
                                    java.lang.String partOfSpeech)

Gets the phone list for a given word. If a phone list cannot be found, returns null. The format is lexicon dependent. If the part of speech does not matter, pass in null.

Specified by:: getPhones in interface Lexicon

Parameters:: word - the word to find; partOfSpeech - the part of speech
Returns:: the list of phones for word or null

getPhones

public java.lang.String[] getPhones(java.lang.String word,
                                    java.lang.String partOfSpeech,
                                    boolean useLTS)

Gets the phone list for a given word. If a phone list cannot be found, null is returned. The partOfSpeech is implementation dependent, but null always matches.

Specified by:: getPhones in interface Lexicon

Parameters:: word - the word to find; partOfSpeech - the part of speech or null; useLTS - whether to use the letter-to-sound rules when the word is not in the lexicon.
Returns:: the list of phones for word or null

getPhones

protected java.lang.String[] getPhones(java.util.Map lexicon,
                                       java.lang.String word,
                                       java.lang.String partOfSpeech)

Gets a phone list for a word from a given lexicon. If a phone list cannot be found, returns null. The format is lexicon dependent. If the part of speech does not matter, pass in null.

Parameters:: lexicon - the lexicon; word - the word to find; partOfSpeech - the part of speech
Returns:: the list of phones for word or null

getPhones

protected java.lang.String[] getPhones(java.util.Map lexicon,
                                       java.lang.String wordAndPartOfSpeech)

Gets a phone list for a word from a given lexicon. If a phone list cannot be found, returns null.

Parameters:: lexicon - the lexicon; wordAndPartOfSpeech - word and part of speech concatenated together
Returns:: the list of phones for word or null

getPhones

protected java.lang.String[] getPhones(java.lang.String phones)

Turns the phone String into a String[], using " " as the delimiter.

Parameters:: phones - the phones
Returns:: the phones split into an array

addAddendum

public void addAddendum(java.lang.String word,
                        java.lang.String partOfSpeech,
                        java.lang.String[] phones)

Adds a word to the addenda.

Specified by:: addAddendum in interface Lexicon

Parameters:: word - the word to find; partOfSpeech - the part of speech; phones - the phones for the word

removeAddendum

public void removeAddendum(java.lang.String word,
                           java.lang.String partOfSpeech)

Removes a word from the addenda.

Specified by:: removeAddendum in interface Lexicon

Parameters:: word - the word to remove; partOfSpeech - the part of speech

dumpBinary

public void dumpBinary(java.lang.String path)

Dumps this lexicon (just the compiled form). Lexicon will be dumped to two binary files PATH_compiled.bin and PATH_addenda.bin

Parameters:: path - the root path to dump it to

compare

public boolean compare(LexiconImpl other)

Tests to see if this lexicon is identical to the other for debugging purposes.

Parameters:: other - the other lexicon to compare to
Returns:: true if lexicons are identical

fixPartOfSpeech

protected static java.lang.String fixPartOfSpeech(java.lang.String partOfSpeech)

Fixes the part of speech if it is null. The default representation of a null part of speech is the number "0".

Overview

Package

Class

Tree

Deprecated

Index

Help

PREV CLASS NEXT CLASS

FRAMES NO FRAMES

SUMMARY: NESTED | FIELD | CONSTR | METHOD

DETAIL: FIELD | CONSTR | METHOD

com.sun.speech.freetts.lexicon Class LexiconImpl

tokenizeOnLoad

tokenizeOnLookup

LexiconImpl

LexiconImpl

setLexiconParameters

isLoaded

load

createLexicon

loadTextLexicon

parseAndAdd

getPhones

getPhones

getPhones

getPhones

getPhones

addAddendum

removeAddendum

dumpBinary

compare

fixPartOfSpeech

com.sun.speech.freetts.lexicon
Class LexiconImpl