35
T E X T A L O U D 2 . 0 U S E R S M A N U A L _ 1 1
11
SAPI – Speech Application Interface
As with many aspects of computing using Windows computers, Microsoft has created standard
interfaces for speech. SAPI, or Microsoft’s Speech API, is Microsoft’s attempt to create a standard
way for multiple companies to develop voices and make them available through a common
programming method. Normally users never need to know about these interfaces. However,
there are two versions of SAPI, with varying abilities, and with different support from different
engines. These two versions are SAPI4 and SAPI5. TextAloud supports both, however, some
engines and some features are only supported in one or the other.
SAPI4 is the earlier version of the standard. It is supported by all of the voices created before
1999, and some created later. SAPI4 will work on most Windows computers. It provides little
ability to customize speech beyond TextAloud’s basic pronunciation editor. Voices most often
used with TextAloud that will support SAPI4 are the Microsoft voices Mary, Mike, and Sam, along
with the older L&H Voices. Some users may also have IBM ViaVoice voices from other products
that will be available under SAPI4. AT&T Natural Voices also support SAPI4
SAPI5 is the newer standard. It is supported by newer voices such as AT&T Natural Voices,
NeoSpeech Voices, Cepstral, and other new voices we will be releasing. SAPI5 adds additional
features not available in SAPI4. These include the ability to change voices within a single article of
text (for example, to create a script read by multiple characters), the ability to insert pitch and
speech changes into the text, and additional specialized TAGs for manipulating speech further. A
few features within TextAloud are only available when you tell the program to use ONLY SAPI5
Voices. This change is made via Options->Voice and File Options. These functions include the
Advanced Pronunciation Editor and Inserting Voice Changes.
NOTE: One important thing to note related to advanced SAPI5 features is that not all
voices support all SAPI5 options. For example, Neospeech voices do not support the advanced
pronunciation editor. AT&T Voices do not support pitch adjustment or emphasis adjustment.
For more information on advanced speech manipulation using SAPI5 XML Tags, you can
download the SAPI documentation at
http://www.microsoft.com/speech/techinfo/apioverview/#_doc
Another important note is that installing TextAloud will not automatically install both the sapi4 and
sapi5 interfaces. Windows 98 ships with SAPI4. Windows 2000 and XP ship with SAPI5 already
installed. If neither is available, TextAloud will warn you and give you directions for installing one
or the other. You can download and install SAPI4 and SAPI5 via
http://www.nextup.com/sapi.html
31
T E X T A L O U D 2 . 0 U S E R S M A N U A L _ 1 2
12
Voice Sample Rates
Audio Files are digital representations of analog signals. When these sounds are digitized, they are
sampled and converted to numbers. The rate at which they are sampled is the sample rate. That
sounds a little complicated, and it is. The only part you really need to remember is that each voice
engine has a natural sample rate. This rate is represented as a number, 8kHz, 11kHz, 16kHz, or
22kHz. For most uses this sample rate doesn’t matter as long as you are happy with the sound
you are getting from the voice. There are a couple of uses where this sample rate does matter
though.
Telephony usage, where voices are played over the phone, such as with voice menu systems or
computer based answer machines, typically require 8kHz voices. You’ll notice when you purchase
AT&T voices or Neospeech voices from NextUp.com, you’ll see an option to purchase either 8kHz
or 16kHz voices. While 16kHz voices sound best on the computer, either 8kHz or 16kHz will
work fine when played within TextAloud. However, voices purchased for telephony usage should
be 8kHz.
The other area where sample rates become important is when creating audio files for certain
portable uses. See Creating Audio Files later in this document for further discussion of this.
One thing to note when selecting voices is that sample rate alone doesn’t define the quality of the
voice. For example, Microsoft voices are 22kHz, but don’t sound nearly as good as 16kHz AT&T
voices. But, within a given engine, higher Bitrates sound better. 22kHz Microsoft voices sound
better than the 8kHz Microsoft voices. 16kHz AT&T voices sound better than the 8kHz AT&T
Voices.
Languages
Text To Speech Engines and Voices are available in a variety of languages. In the following
sections when we detail commonly available voices, available languages will be listed.
Each voice has a native language. TTS Voices and TextAloud will not do translation. So for
example, if you have English text and use a Spanish voice, you will not hear the Spanish version of
that text. Likewise, voices do not do a good job of reading a language other than the one it was
designed for. Using English text with a French voice will not give you English with a French
accent, it will give you mostly gibberish.
VB.NET PDF - WPF PDF Viewer for VB.NET Program Highlight Text. Add Text. Add Text Box. Drawing Markups. PDF Print. VB.NET project, annotate PDF document with various notes and shapes, convert PDF to Word
adding text to a pdf in reader; add textbox to pdf file
33
T E X T A L O U D 2 . 0 U S E R S M A N U A L _ 1 3
13
Available Voices
Here are voices that are commonly available for use within TextAloud:
•
Microsoft Voices – Originally licensed by Microsoft from Lernout & Hauspie (L&H), Mary,
Mike, and Sam are available free for download at the bottom of
http://www.nextup.com/TextAloud/SpeechEngine/voices.html
While these voices are older and do not have a natural sound, they do a decent job with
most text and are easily understood. They are available in English only and take up less
than 2mb per voice and are available in SAPI4 or SAPI5 versions.
•
L&H Voices – These voices are of similar quality to the Microsoft voices, and are available
for download at the bottom of
http://www.nextup.com/TextAloud/SpeechEngine/voices.html
They take up less than 2mb per voice, support only SAPI4, and are available in the
following languages:
o
American English
o
British English
o
Dutch
o
French
o
German
o
Italian
o
Portuguese
o
Spanish
38
T E X T A L O U D 2 . 0 U S E R S M A N U A L _ 1 4
14
o
•
AT&T Natural Voices – These are among the most natural sounding voices around. You
can hear samples of AT&T voices at
http://www.nextup.com/attnv.html
and play with an interactive demo at
http://www.nextup.com/nvdemo.html
These voices are available for purchase at NextUp.com. They take up nearly 600mb per
voice, support both SAPI4 and SAPI5, and are available in the following languages:
o
American English
o
British English
o
American English (Indian Accent)
o
German
o
Latin American Spanish
o
French
NOTE: AT&T Natural Voices use a significant amount of system memory.
We recommend 256mb or more of memory for these voices.
•
NeoSpeech Voices – These are also among the most natural sounding you will find. You
can hear samples of Neospeech voices at
http://www.nextup.com/neospeech.html
These voices are available for purchase at NextUp.com. They take up nearly 300mb per
voice, support SAPI4 Only, and are available in American English.
•
Cepstral Voices – These are high-quality voices with smaller machine requirements. You
can hear samples of Cepstral voices at
http://www.nextup.com/Cepstral.html
These voices are available for purchase at NextUp.com. They take up around 25mb per
voice, support SAPI5 Only, are currently available in American English and Scottish Accent
English, a child’s English Voice, with more languages are coming soon..
39
T E X T A L O U D 2 . 0 U S E R S M A N U A L _ 1 5
15
•
Other Voices – NextUp.com is adding new voices often, so be sure to check with us about
newly available voices. In addition to the voices listed above, TextAloud will attempt to
load any other SAPI4 or SAPI5 voices that may be available on your computer. If you have
other speech products, you may have received other voices, such as those from IBM or
ScanSoft. If available, these voices will be shown on the Engines/Voices TAB in TextAloud,
available under Options->Voice and File Options. While we have tested with as many
voices as possible, if you find you have additional voices and they do not seem to work
properly in TextAloud, email us at support@nextup.com
for assistance.
NOTE: Anytime you install a new voice, it will not immediately be available within
TextAloud. You must first fully exit TextAloud via File->Exit, then restart the program
before new voices will be displayed.
A voice is assigned to each article added to TextAloud. This voice can be changed via the Voice
Dropdown list on the main window. Which voice is assigned is controlled by the settings for
Voice Selection under Options->TextAloud Options->Article Options. Voice Assignment can
be set to:
•
Default – The default voice as selected under Options->Voice and File Options is used.
•
Random – A random voice is selected from the enable voices. Voices can be enabled and
disabled under Options->Voice and File Options.
•
Round Robin – Voices are rotated sequentially based on enabled voices.
Improving Speech
Judging the quality of TTS Voices is very subjective. What sounds good to one user may not be
intelligible to another. TextAloud has many features to allow you to improve the sound. The most
important thing you can do is upgrade to one or more of the higher-quality voices listed above.
Additionally, TextAloud easily allows you to adjust the speed, pitch and volume on voices with the
sliders on the main window. (NOTE: AT&T Natural Voices do not support pitch adjustments).
Most users, as they become accustomed to listening to TextAloud will gradually increase the
speed of speech to enable faster reading.
Also see later sections on the Basic and Advanced pronunciation editors to assist with any words
that the voices mispronounce.
4.0 Single vs. Multi Article Mode
Before jumping into details of using TextAloud, the concepts of Articles and Single vs. Multi Article
mode need to be understood.
33
T E X T A L O U D 2 . 0 U S E R S M A N U A L _ 1 6
16
An Article is a piece of text you have placed into TextAloud to be read. It may be a single news
story, a single document, a single paragraph, whatever you want it to be. The point is, it is dealt
with as a single entity within TextAloud. This means it is a chunk of text that is dealt with as one
and is given a single title. Understanding this concept helps understand the remainder of this
document. If you open a text file into TextAloud, that becomes a single article. If you copy an
entire news story to TextAloud using the clipboard, that becomes a single article.
Single Article Mode
There are two distinct modes of operation in TextAloud, Single Article Mode, and Multi Article
Mode. You can change Mode by choosing Options->Single Article or Options->Multi Article.
Most users will keep TextAloud in Multi Article mode as detailed in the next section. However,
there are situations where a user may want to use Single Article Mode.
In Single Article Mode, you can only deal with one article at a time. Anytime a new article is
loaded, the previous article is deleted. If you typically deal with only one thing you may want to
hear, then once it is done, you are going off to look for something else, then single article mode
may be simpler.
One powerful use of Single Article Mode is by turning on Automatic Speaking. Under Options-
>TextAloud Options->Article Options, if you check the Automatically Speak New Text, when
in Single article mode, TextAloud will automatically speak any newly opened articles or new
articles added via Clipboard Watching.
Multi Article Mode
Place TextAloud in Multi Article Mode via Options->Multi Article. In Multi Article Mode,
TextAloud can support an unlimited number of articles. Each article is automatically assigned a
title based on the first few words in the article. You can change this article title by overwriting it in
the Title Field. When a new article is added, the new article is displayed, but all previous articles
are still available within TextAloud. You can change to a previously added article using the Article
List dropdown at the top of the window.
As mentioned, most users prefer Multi Article Mode because it builds a queue of articles. Then
when ready, you can listen to individual articles, your complete list of articles, or easily create
audio files from the entire list of articles. Based on the Automatically Delete Text After Play
setting under Options->TextAloud Options->Article Options, articles can be automatically
removed after spoken.
13
T E X T A L O U D 2 . 0 U S E R S M A N U A L _ 1 7
17
5.0 TextAloud Main Window Overview
The TextAloud main window is depicted below:
Numbers on the picture annotate key areas detailed as follows:
1.
TextAloud Main Menu – Provides menu items to all functions in TextAloud. Note that
most menu items have shortcut keys, and using Options->Shortcut Setup allows you
to
customize
those
keyboard
shortcuts.
40
T E X T A L O U D 2 . 0 U S E R S M A N U A L _ 1 8
18
2.
TextAloud Toolbar – These icons provide quick access to commonly used menu
items. Pause your mouse over an icon and a popup hint will provide a description of
the icon’s function. You can adjust the size of the toolbar, remove it completely, or
remove captions via the View->Toolbar menu item. The last two buttons on this icon
provide information about TextAloud for schools and Language support. These are
primarily marketing vehicles, and you may right-click these buttons and choose Hide on
the popup menu to remove these two icons.
3.
Article List Dropdown – This is visible only in Multi Article Mode. Click this
dropdown list to see the full list of all articles currently in TextAloud, then select an
article title to load it into the main window.
4.
Article Title – This field is the Title of the current article. Article titles are generated
based on the first few words of an article when the article is first loaded. You can
change the title by typing in this field. NOTE: The article title becomes the filename
when creating audio files.
5.
Voice Dropdown – The voice dropdown shows the voice selected for the current
article. You can click this field to select a different voice from enabled voices. When
you change the voice, the speed and pitch settings on the right of the screen will be
adjusted to the saved values for that voice.
6.
TextAloud Text Area – This area is where text of the current article is displayed.
When manually creating an article, type or paste in this area. While the article is being
spoken, the word currently being spoken will be highlighted in this area (assuming
word highlighting is turned on under Options->TextAloud Options->Miscellaneous).
You may edit text in this area, and several actions are available on a popup menu when
you right-click in this area.
7.
Status Bar – The status bar at the bottom of the window will often provide helpful
status information such as the number of current articles or progress indicators during
speaking.
8.
Pitch Slider – This area contains a slider to allow easy adjustment of the pitch for the
current voice. Note that some voices do not support pitch adjustments, as shown in the
figure above. Changing a value here will adjust the default pitch for the current voice,
so the new value will be used anytime that voice is selected.
31
T E X T A L O U D 2 . 0 U S E R S M A N U A L _ 1 9
19
9.
Speed Slider – Use this slider to speed up or slow down speaking when using the
current voice. All voices support speed adjustments. You will notice the ranges shown
will vary based on voice and SAPI version they support. Changing a value here will
adjust the default speed for the current voice.
10.
Volume Slider – The volume slider controls the WAVE volume for the Windows mixer.
This is the same volume that is shown if you bring up the Windows mixer by double-
clicking the speaker icon in the system tray area near the clock on the Windows start
bar. Changes here are not voice specific but change Windows volume settings for most
audio programs.
11.
<Speak> Button – Click this button to have the current article spoken aloud. This has
the same action as clicking Speak->Speak Current Article.
12.
<Speak To File> Button – Click this button to create an audio file from the currently
selected article.
13.
<Delete> Button – Deletes the current article. Same action as Edit->Delete Selected
Article.
14.
Text Scrollbar – If the current article has more text than can be shown within the
TextAloud window, this scrollbar will appear and can be used to navigate up and down
the text of the article.
15.
Multi Article Scrollbar – When TextAloud has more than one article, use this scrollbar
to navigate to different articles. This is useful to move to the next or previous article, or
a quick method to the first or last. To move to a particular article, using the Article List
Dropdown is easier since you can identify the articles by title.
Documents you may be interested
Documents you may be interested