com.sun.speech.freetts.en
Class TokenizerImpl

java.lang.Object
  extended bycom.sun.speech.freetts.en.TokenizerImpl
All Implemented Interfaces:
Tokenizer

public class TokenizerImpl
extends java.lang.Object
implements Tokenizer

Implements the tokenizer interface. Breaks an input sequence of characters into a set of tokens.


Field Summary
static java.lang.String DEFAULT_POSTPUNCTUATION_SYMBOLS
          A string containing the default post-punctuation characters.
static java.lang.String DEFAULT_PREPUNCTUATION_SYMBOLS
          A string containing the default pre-punctuation characters.
static java.lang.String DEFAULT_SINGLE_CHAR_SYMBOLS
          A string containing the default single characters.
static java.lang.String DEFAULT_WHITESPACE_SYMBOLS
          A string containing the default whitespace characters.
static int EOF
          A constant indicating that the end of the stream has been read.
 
Constructor Summary
TokenizerImpl()
          Constructs a Tokenizer.
TokenizerImpl(java.io.Reader file)
          Creates a tokenizer that will return tokens from the given file.
TokenizerImpl(java.lang.String string)
          Creates a tokenizer that will return tokens from the given string.
 
Method Summary
 java.lang.String getErrorDescription()
          if hasErrors returns true, this will return a description of the error encountered, otherwise it will return null
 Token getNextToken()
          Returns the next token.
 boolean hasErrors()
          Returns true if there were errors while reading tokens
 boolean hasMoreTokens()
          Returns true if there are more tokens, false otherwise.
 boolean isBreak()
          Determines if the current token should start a new sentence.
 void setInputReader(java.io.Reader reader)
          Sets the input reader
 void setInputText(java.lang.String inputString)
          Sets the text to tokenize.
 void setPostpunctuationSymbols(java.lang.String symbols)
          Sets the postpunctuation symbols of this Tokenizer to the given symbols.
 void setPrepunctuationSymbols(java.lang.String symbols)
          Sets the prepunctuation symbols of this Tokenizer to the given symbols.
 void setSingleCharSymbols(java.lang.String symbols)
          Sets the single character symbols of this Tokenizer to the given symbols.
 void setWhitespaceSymbols(java.lang.String symbols)
          Sets the whitespace symbols of this Tokenizer to the given symbols.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

EOF

public static final int EOF
A constant indicating that the end of the stream has been read.

See Also:
Constant Field Values

DEFAULT_WHITESPACE_SYMBOLS

public static final java.lang.String DEFAULT_WHITESPACE_SYMBOLS
A string containing the default whitespace characters.

See Also:
Constant Field Values

DEFAULT_SINGLE_CHAR_SYMBOLS

public static final java.lang.String DEFAULT_SINGLE_CHAR_SYMBOLS
A string containing the default single characters.

See Also:
Constant Field Values

DEFAULT_PREPUNCTUATION_SYMBOLS

public static final java.lang.String DEFAULT_PREPUNCTUATION_SYMBOLS
A string containing the default pre-punctuation characters.

See Also:
Constant Field Values

DEFAULT_POSTPUNCTUATION_SYMBOLS

public static final java.lang.String DEFAULT_POSTPUNCTUATION_SYMBOLS
A string containing the default post-punctuation characters.

See Also:
Constant Field Values
Constructor Detail

TokenizerImpl

public TokenizerImpl()
Constructs a Tokenizer.


TokenizerImpl

public TokenizerImpl(java.lang.String string)
Creates a tokenizer that will return tokens from the given string.

Parameters:
string - the string to tokenize

TokenizerImpl

public TokenizerImpl(java.io.Reader file)
Creates a tokenizer that will return tokens from the given file.

Parameters:
file - where to read the input from
Method Detail

setWhitespaceSymbols

public void setWhitespaceSymbols(java.lang.String symbols)
Sets the whitespace symbols of this Tokenizer to the given symbols.

Specified by:
setWhitespaceSymbols in interface Tokenizer
Parameters:
symbols - the whitespace symbols

setSingleCharSymbols

public void setSingleCharSymbols(java.lang.String symbols)
Sets the single character symbols of this Tokenizer to the given symbols.

Specified by:
setSingleCharSymbols in interface Tokenizer
Parameters:
symbols - the single character symbols

setPrepunctuationSymbols

public void setPrepunctuationSymbols(java.lang.String symbols)
Sets the prepunctuation symbols of this Tokenizer to the given symbols.

Specified by:
setPrepunctuationSymbols in interface Tokenizer
Parameters:
symbols - the prepunctuation symbols

setPostpunctuationSymbols

public void setPostpunctuationSymbols(java.lang.String symbols)
Sets the postpunctuation symbols of this Tokenizer to the given symbols.

Specified by:
setPostpunctuationSymbols in interface Tokenizer
Parameters:
symbols - the postpunctuation symbols

setInputText

public void setInputText(java.lang.String inputString)
Sets the text to tokenize.

Specified by:
setInputText in interface Tokenizer
Parameters:
inputString - the string to tokenize

setInputReader

public void setInputReader(java.io.Reader reader)
Sets the input reader

Specified by:
setInputReader in interface Tokenizer
Parameters:
reader - the input source

getNextToken

public Token getNextToken()
Returns the next token.

Specified by:
getNextToken in interface Tokenizer
Returns:
the next token if it exists, null if no more tokens

hasMoreTokens

public boolean hasMoreTokens()
Returns true if there are more tokens, false otherwise.

Specified by:
hasMoreTokens in interface Tokenizer
Returns:
true if there are more tokens false otherwise

hasErrors

public boolean hasErrors()
Returns true if there were errors while reading tokens

Specified by:
hasErrors in interface Tokenizer
Returns:
true if there were errors; false otherwise

getErrorDescription

public java.lang.String getErrorDescription()
if hasErrors returns true, this will return a description of the error encountered, otherwise it will return null

Specified by:
getErrorDescription in interface Tokenizer
Returns:
a description of the last error that occurred.

isBreak

public boolean isBreak()
Determines if the current token should start a new sentence.

Specified by:
isBreak in interface Tokenizer
Returns:
true if a new sentence should be started