FreeTTS 1.2 - A speech synthesizer written entirely in the JavaTM programming language |
FreeTTS is a speech synthesis system written entirely in the JavaTM programming language. It is based upon Flite: a small run-time speech synthesis engine developed at Carnegie Mellon University. Flite is derived from the Festival Speech Synthesis System from the University of Edinburgh and the FestVox project from Carnegie Mellon University.
This release of FreeTTS includes:
FreeTTS was built by the Speech Integration Group of Sun Microsystems Laboratories:
You can contact the Sun Microsystems Speech Integration Group through the FreeTTS Forums.
FreeTTS is based on CMU's Flite, written by:
Kevin and Alan generated the data used by FreeTTS. In addition, Kevin is the voice behind the diphone voices (kevin 8k, kevin 16k), and Alan is the voice behind the speaking clock.
Support for MBROLA voice output was contributed by Marc Schröder, text-to-speech Researcher in the Language Technology Lab at DFKI, Saarbrücken, Germany.
Support for importing FestVox voices into FreeTTS, and support for dynamically discovering and loading voices was developed by David Vos, a Sun Microsystems Laboratories student intern.
Here are a few possible uses of FreeTTS:
We welcome contributions to FreeTTS. If you have code or fixes you would like to submit, please contact the FreeTTS team at freetts-contacts@sourceforge.net. The terms for contributing code are generous and are as follows:
These terms are for your and our protection and help ensure FreeTTS continues to be a viable and successful open source project.
Refer to acknowledgments to see the list of people and organizations we would like to thank for making this project possible. Most of all, we thank our management for letting us do this, and Alan Black and Kevin Lenzo for doing Flite.
FreeTTS has been tested on the Solaris TM Operating Environment, Mac OS X, Linux and Win32 operating systems.
Running, building, and testing FreeTTS requires the JavaTM 2 SDK, Standard Edition, 1.4. You can download the developer kit from http://java.sun.com/j2se/1.4/. Make sure you set your JAVA_HOME environment variable to point your installation (e.g., JAVA_HOME=/usr/java/j2sdk1.4.0).
FreeTTS has three packages available for download:
If you plan on just creating applications with FreeTTS, the bin package will be sufficient. If you plan on making modifications to FreeTTS itself, however, you should use the src package. The tst package will be useful if you wish to make sure any changes you made to FreeTTS did not introduce any bugs or regressions.
Download and unpack the package(s) appropriate for what you want to do. Depending upon what you download, you will end up with all or part of the following directory structure:
bin Binaries for the demos build.xml Ant file for building the sources com FreeTTS sources de Sources for MBROLA support demo Sources for the demos demo.xml Ant file for building the demos docs System documentation javadoc Javadoc for FreeTTS lib Jars for FreeTTS mbrola Support for MBROLA tests Sources and scripts for JUnit and regression tests tools Tools for importing CMU ARCTIC and FestVox voice data
FreeTTS makes liberal use of the "Class-Path" attribute of a jar Manifest. As such, you need to place very little in your classpath when you run applications. The only things you need to do are the following:
Note that the demonstration applications also use a jar Manifest that uses the "Class-Path" attribute. The build places the jar files for FreeTTS in the lib directory, and the jar files for the demos in the bin directory. The jar manifests for the demos depend on the lib and bin directories being in the same top level directory. If you change this, the demos may not work properly.
If you are not interested in building FreeTTS, then you only need to download the FreeTTS binary distribution from the FreeTTS Download Page. Once you've downloaded and unpacked the FreeTTS binary distribution, perform the following steps:
We have provided a number of demonstration applications that use FreeTTS. We highly suggest that you use these as examples for how to create your own applications. As noted above, FreeTTS makes liberal use of the "Class-Path" attribute of a jar Manifest. As such, you need to place very little in your classpath when you run applications. The only things you need to do are the following:
The prerequisites for building FreeTTS are as follows:
JAVA_HOME
environment variable to point to
where you installed it.
junit.jar
to the lib
directory of your Apache Ant installation.
To build FreeTTS, merely type the following in a command prompt situated at the top level FreeTTS directory:
ant
This executes the Apache Ant
command to build the FreeTTS classes, voices, demos, and jar files.
The output will be placed under the bld
directory.
We have also provided a number of ant targets for convenience:
ant clean
: deletes all the output from the build
to give you a fresh start
ant javadoc
: builds the javadoc documentation and
places the results in the javadoc
directory
ant junit
: for testing only; runs the JUnit tests
(see Testing FreeTTS)
FreeTTS includes a number of unit and regression tests. The unit tests verify that critical routines are working properly. The regression tests verify that the output of FreeTTS matches what is expected.
Although we test FreeTTS regularly as part of our development process, testing FreeTTS is optional for you. The prerequisites for testing FreeTTS are as follows:
sed, awk, diff,
and wc
.
For Windows users, these tools are available with the Cygwin
(http://www.cygwin.com)
package. As part of the Cygwin install, make sure you select
the "make" package from the "Devel" category, the "findutils"
package from the "Base" category, and the "zip" package from
the "Archive" category. In addition, make sure you modify
your PATH environment variable to include the cygwin/bin
directory before any Windows directories.
To run the units tests for FreeTTS, merely type the following in a command prompt situated at the top level FreeTTS directory:
ant junit
The test output should be self explanatory.
To run the regression tests, merely type the following in a command
prompt situated at the FreeTTS tests
directory:
./regression.sh
The test output should be self explanatory.
FreeTTS includes a number of demos. Each demo directory has Java source file(s) containing the demo source and a 'README.html' file with brief instructions as to how to run the demo.
NOTE: The binaries for the demos exist as jar files in the bin directory of the binary distribution. If you only wish to run the demos, follow only the "Running" instructions for each demo. If you want to compile the demos, you must get the sources from the FreeTTS source distribution available on the FreeTTS Download Page.
Note also that the demonstration applications also use a jar Manifest that uses the "Class-Path" attribute. The build places the jar files for FreeTTS in the lib directory, and the jar files for the demos in the bin directory. The jar manifests for the demos depend on the lib and bin directories being in the same top level directory. If you change this, the demos may not work properly.
The FreeTTS distribution includes a program that will allow you to test many of the features of FreeTTS. This program is started by running the following command:
java -jar lib/freetts.jar
.
NAME freetts - exercise the FreeTTS synthesis sytem
DESCRIPTION The lib/freetts.jar contains a main entry point that allows a user to interactively control the FreeTTS synthesizer. When invoked with no arguments, freetts will read text from the command line and convert the text to speech. freetts can also be used to convert text from a file to speech. It includes options that allow you to redirect the audio to file, as well as a number of metrics and debugging options. OPTIONS There are a number of options that can be used to affect the operation of freetts as described here: -detailedMetrics: turn on detailed metrics -dumpAudio file : dump audio to file -dumpAudioTypes : dump the possible output types -dumpMultiAudio file : dump audio to file -dumpRelations : dump the relations -dumpUtterance : dump the final utterance -dumpASCII file : dump the final wave to file in ASCII form (for testing) -file file : speak text from given file -lines file : render lines from a file -help : shows usage information -metrics : turn on metrics -run name : sets the name of the run -silent : don't say anything -streaming : use streaming audio player -text say me : speak given text (should be last argument) -url path : speak text from given URL -verbose : verbose output -version : shows version number -voice VOICE : kevin, kevin16, mbrola_us1, mbrola_us2, or mbrola_us3 -voiceInfo : print detailed voice info EXAMPLESInteractive mode: % java -jar lib/freetts.jar Enter text: Hello World. <text is spoken> Enter text: ^D %
Speaking text from a command line: % java -jar lib/freetts.jar -text hello world <text is spoken>
Speaking text from a file: % java -jar lib/freetts.jar -file my_email.txt <text is spoken>
Selecting an alternate voice: % java -jar lib/freetts.jar -voice kevin16 -text Hello World <text is spoken>
Redirecting audio to a file: % java -jar lib/freetts.jar -dumpAudio hello.wav -text Hello World
FreeTTS now has the ability to import voice data from FestVox (US English only). With this, you can record your own voice using the FestVox tools, and then turn the resulting data into a FreeTTS voice.
Visit our FestVoxToFreeTTS page to learn how to create your own voices for FreeTTS. It's not trivial, and it requires using Festival and FestVox.
FreeTTS now has the ability to import CMU ARCTIC voice data from FestVox (US English only). The CMU ARCTIC voices are quite large and require a little extra work, so we created tools just for these voices.
Visit our ArcticToFreeTTS page to learn how to import the CMU ARCTIC voices into FreeTTS.
Some of the many compelling reasons to use Java 2 SDK, Standard Edition, v1.4. are:
java.nio
) package
that provides memory mapped file I/O. This package
drastically reduces the load times of the FreeTTS
databases. java.util.regex
). They are used in the
FreeTTS text normalization step.assert
keyword to ensure that
certain conditions are satisfied before continuing
execution. FreeTTS uses this keyword in all stages of speech
synthesis.-server
switch, byte codes are optimized
to eliminate bounds checking on array accesses whenever
possible.We compared the performance of FreeTTS with that of Flite (original C version) on a machine with this configuration:
We rendered the first two chapters of Alice's Adventures in Wonderland by Lewis Carroll (about 20 minutes of text), and the entire text of Jules Verne's Journey to the Center of the Earth (about 8 hours of text) using both Flite and FreeTTS. The results are summarized below:
Single CPU 296MHz SPARC v9 | Flite | FreeTTS |
Loading Time for 'Alice' text | 0.0s | 4.1s |
Processing Time for 'Alice' text | 43.7s | 24.1s |
Loading Time for 'Journey' text | 0.0s | 7.0s |
Processing Time for 'Journey' text | 1019.2s | 341.0s |
Time to first Sample (10 word sentence) | 195ms | 41ms |
On a 2-CPU system with the following configuration:
The results are summarized below:
Dual CPU 360MHz SPARC v9 | Flite | FreeTTS |
Loading Time for 'Alice' text | 0.0s | 2.9s |
Processing Time for 'Alice' text | 35.7s | 14.2s |
Loading Time for 'Journey' text | 0.0s | 3.8s |
Processing Time for 'Journey' text | 842.7s | 189.5s |
Time to first Sample (10 word sentence) | 165ms | 33ms |
Currently, the distribution comes with these 3 voices:
Each of the demos describes how to select which voice to use.
FreeTTS also interfaces with the MBROLA synthesizer and can use MBROLA voices. There are three US English MBROLA voices available:
See Installing MBROLA Voices for more details on installing support for MBROLA voices.
NOTE: FreeTTS does not support MBROLA on the Windows platform.
First of all, we're happy to say you can now create your own voices for FreeTTS, and you can also import CMU ARCTIC voices. It's not trivial, and it requires using Festival and FestVox.
However, the ability to create your own voices doesn't explain why the current voices sound so bad. FreeTTS uses the same algorithms and voice data from Flite. Here is what the Flite README says about voice quality:
"So you've eagerly downloaded flite, compiled it and run it, now you are disappointed that is doesn't sound wonderful, sure its fast and small but what you really hoped for was the dulcit tones of a deep baritone voice that would make you desperately hang on every phrase it sang. But instead you get an 8Khz diphone voice that sounds like it came from the last millenium.Well, first, you are right, it is an 8KHz diphone voice from the last millenium, and that was actually deliberate. As we developed flite we wanted a voice that was stable and that we could directly compare with that very same voice in Festival. Flite is an *engine*. We want to be able take voices built with the FestVox process and compile them for flite, the result should be exactly the same quality (though of course trading the size for quality in flite is also an option). The included voice is just an sample voice that was used in the testing process. We have better voices in Festival and are working on the conversion process to make it both more automatic and more robust and tunable, but we haven't done that yet, so in this first beta release. This old poor sounding voice is all we have, sorry, we'll provide you with free, high-quality, scalable, configurable, natural sounding voices for flite, in all languages and dialects, with the tools to built new voices efficiently and robustly as soon as we can. Though in the mean time, a few higher quality voices will be released with the next version.''
As of FreeTTS 1.2, we provide a set of tools that allow you to import FestVox voice data directly. As such, you need to use Festival and FestVox to record your data. See our documentation for more detail on how to do this.
As of FreeTTS 1.2, we provide a set of tools that allow you to import FestVox voice data directly, and we also have tools that allows you to import CMU ARCTIC voices.
This is not a trivial task as it requires a lexicon for the language as well as various statistical data about the language. The document http://festvox.org/festvox/festvox_toc.html describes this is more detail.
With the FreeTTS
test program, you can dump audio output
to a file using the -dumpAudio
option:
-dumpAudio filename
The audio file format can be .wav, .au, or .aif, depending on
the file name. For example, if "filename"
is
"foo.au" the file format will be .au.
The -dumpMultiAudio
option (same format as
-dumpAudio
) dumps audio to multiple audio files,
one file per utterance. In this case, if
"filename"
is "foo.wav", the files are named
foo0.wav, foo1.wav, foo2.wav, etc.. Again, the file format is
determined by the extension of the filename.
If you are writing your own application, you can set the audio player of the FreeTTS Voice to one of the file-based audio players. See the FreeTTS API documentation for:
desc = (SynthesizerModeDesc) synthesizer.getEngineModeDesc(); javax.speech.synthesis.Voice[] jsapiVoices = desc.getVoices(); javax.speech.synthesis.Voice jsapiVoice = voices[0]; /* Non-JSAPI modification of voice audio player */ if (jsapiVoice instanceof com.sun.speech.freetts.jsapi.FreeTTSVoice) { com.sun.speech.freetts.Voice freettsVoice = ((com.sun.speech.freetts.jsapi.FreeTTSVoice) jsapiVoice).getVoice(); freettsVoice.setAudioPlayer(new SingleFileAudioPlayer()); }
No. Since FreeTTS is a speech synthesis system, none of the JSAPI 1.0 Recognition interfaces are supported. In addition, FreeTTS supports only a subset of the JSAPI 1.0 javax.speech.synthesis specification. The FreeTTS support for JSAPI 1.0 has the following restrictions:
WORD_STARTED
or the MARKER_REACHED
events.Synthesizer.phoneme()
method is not
implemented.PropertyVeto
exceptions are not always properly
thrown when property change requests are rejected or
constrained.Note that the JSAPI specification is undergoing changes. The official work being done on JSAPI is now for JSAPI 2.0 via the Java Community Process (JCP) under JSR-113. Read more about the JCP and JSR-113 at http://www.jcp.org.
You probably need to install the JSAPI 1.0 specification implementation. See the JSAPI setup guide for more details.
You probably are trying to run with an older (jdk 1.4) version of the java runtime. To verify this type:
% java -version
You should see something like: java version
"1.4.0"
or higher
. If you see
something older that this, such as java version "1.2.2"
then you are indeed running with an older version of
the java runtime. See Prerequisites
for building and running FreeTTS for more details on what
is needed to run FreeTTS.
There are approximately 45 phonemes in the English language. FreeTTS uses a technique called diphone synthesis which uses pairs of phonemes called diphones as an index into the unit database. Not all phoneme combinations occur in the English language. FreeTTS, in order to conserve space, does not include diphone information for diphones that do not naturally occur. This message indicates that FreeTTS encountered one of these omitted diphones. This generally occurs when FreeTTS tries to speak gibberish or non-English text.
If you run HelloWorld or another one of the FreeTTS demos and receive no messages, then FreeTTS thinks that everything is working fine, but obviously it's not if you are not hearing anything. Try running another java application that uses the javax.sound APIs. Try downloading the javasound demo from http://java.sun.com/products/java-media/sound/samples/JavaSoundDemo/ and make sure that it runs (and 'sounds') OK.
We do not recommend trying this. It's a quagmire of complexity
and the end user experience is not what you expect. The JSAPI
layer of FreeTTS will attempt to access to the
speech.properties
file in the user's home
directory. The applet security mechanism will not allow such
access and will throw a SecurityException if such an attempt is
made.
Instead of using FreeTTS in an applet, we highly recommend you consider using Java Web Start.
FreeTTS includes a WebStartClock demo application that demonstrates how to write a Web Start application that uses FreeTTS.
speech.properties
?
JSAPI's Central
class looks for a file named
speech.properties
. To bypass this, use a
different mechanism. A detailed discussion of this approach
in the FreeTTS Help forum:
Avoiding the need for speech.properties.
The WebStartClock demo also provides an example of how to do this:
public void createSynthesizer() { try { SynthesizerModeDesc desc = new SynthesizerModeDesc(null, "time", /* use "time" or "general" */ Locale.US, Boolean.FALSE, null); FreeTTSEngineCentral central = new FreeTTSEngineCentral(); EngineList list = central.createEngineList(desc); if (list.size() > 0) { EngineCreate creator = (EngineCreate) list.get(0); synthesizer = (Synthesizer) creator.createEngine(); } if (synthesizer == null) { System.err.println("Cannot create synthesizer"); System.exit(1); } synthesizer.allocate(); synthesizer.resume(); } catch (Exception e) { e.printStackTrace(); } }
This message is output when FreeTTS tries to allocate sound resources and the requested resources are unavailable. This can occur for a number of reasons:
This is an implementation of a speech synthesizer and does not include a speech recognizer. Please keep your eye on the 'cmusphinx' project on SourceForge for developments in this area.