23
Bigram DMCMC: 73.0), its lexicon score is lower (LF PHOCUS: 54.5, LF Bigram DMCMC:
62.6). Thus, our online Bayesian learners seem better able to extract a reliable lexicon from the
available data than other recent statistical learners, including one (PHOCUS) that relies on
domain-specific knowledge about word well-formedness.
Second, when we examine the impact of the unigram and bigram assumptions on word
token performance, we find that the bigram learners do not always benefit from assuming words
are predictive of other words. While the Ideal, DPM and DMCMC learners do (bigram F >
unigram F, Ideal: p < .001, DPM: p = .046, DMCMC: p = .002), the DPS learner is harmed by
this bias (unigram F > bigram F: p < .001). This is also true for the lexicon F scores: While the
Ideal and DPM learners are helped (bigram F > unigram F, Ideal: p < .001, DPM: p = .002), the
DPS and DMCMC learners are harmed (unigram F > bigram F, DPS: p < .001, DMCMC: p =
.006).[end note 8]
Third, when comparing our ideal learner to our constrained learners, we find – somewhat
unexpectedly – that some of our constrained learners are performing equivalently or better than
their ideal counterparts. For example, when we look at word token F-scores for our bigram
learners, the DMCMC learner seems to be performing equivalently to the Ideal learner
(DMCMC ≠ Ideal: p = 0.144). Among the unigram learners, our DPM and DMCMC learners
are equally out-performing the Ideal learner (DPM > Ideal: p < .001, DMCMC > Ideal: p < .001,
DPM ≠ DMCMC: p = 0.153) and the DPS is performing equivalently to the Ideal learner (Ideal ≠
DPS: p = 0.136). Turning to the lexicon F-scores, the results look a bit more expected for the
bigram learners: The Ideal learner is out-performing the constrained learners (Ideal > DPM: p <
.001, Ideal > DPS: p < .001, Ideal > DMCMC: p < .001). However, among the unigram learners
we again find something unexpected: the DMCMC learner is out-performing the Ideal learner
21
(DMCMC > Ideal: p = .006). The Ideal learner is still out-performing the other two constrained
learners, however (Ideal > DPM: p = .031, Ideal > DPS: p < .001).
Fourth, GGJ found that both their ideal learners tended to undersegment (putting multiple
words together into one word), though the unigram learner did so more than the bigram learner
(see Table 5 for examples).
[insert Table 5 approximately here: GGJ ideal model performance: Unigram vs. Bigram]
One way to gauge whether undersegmentation is occurring is to look at the boundary
precision and recall scores. When boundary precision is higher than boundary recall,
undersegmentation is occurring; when the reverse is true, the model is oversegmenting (splitting
single words into more than one word). If we examine Table 4, we can see that (as found by
GGJ) the Ideal learners are undersegmenting, with the bigram model doing so less than the
unigram model. Looking at our constrained learners, we can see that the unigram DMCMC
learner is also undersegmenting. However, every other constrained model is oversegmenting,
with the DPS learners being the most blatant oversegmenters; the bigram DMCMC learner
appears to be oversegmenting the least.
We also examined performance on the first and last words in utterances, as compared to
performance over the entire utterance, based on work by Seidl & Johnson (2006) who found that
7-month-olds are better at segmenting words that are either utterance-initial or utterance-final
(see Seidl & Johnson (2006) for detailed discussion on why this might be). If our models are
reasonable reflections of human behavior, we hope to find that their performance on the first and
last words is better than their performance over the entire utterance. Moreover, they should
C# HTML5 Viewer: Deployment on ASP.NET MVC the size of SDK package, all dlls are put into RasterEdge.DocImagSDK a Home folder under Views according to config in picture above. RasterEdge.XDoc.PDF.dll.
how to paste a picture into a pdf; paste picture to pdf C# HTML5 Viewer: Deployment on AzureCloudService Please note: In order to reduce the size of SDK package, all dlls are put into RasterEdge.DocImagSDK/Bin directory (as shown in picture). RasterEdge.XDoc.PDF.dll.
how to copy pdf image into powerpoint; copy image from pdf to
18
perform equally on the first and last words in order to match infant behavior. Figures 3 and 4
show word token F-scores for unigram and bigram learners, respectively, for whole utterances,
first words, and last words. Table 6 shows the significance test scores for comparing first word,
last word, and whole utterance performance for each of the learners.
[Insert Figure 3 approximately here: Performance of Bayesian unigram learners on whole
utterances, first words, and last words]
[Insert Figure 4 approximately here: Performance of Bayesian bigram learners on whole
utterances, first words, and last words]
[Insert Table 6 approximately here: Significance test scores]
Looking first to the Bayesian unigram learners, we find that the DPM and DMCMC
learners match infant behavior best by improving equally on first and last words, compared to
whole utterances. The Ideal learner improves on both first and last words, but improves more for
last words than for first words, making its performance slightly different than infants’. The DPS
learner only achieves better performance for first words, making its performance even more
different from infants’. Turning to the Bayesian bigram learners, we find that only the DPM and
DPS learners are matching infants by improving equally on first and last word performance,
compared to whole utterance performance. Both the Ideal and DMCMC learners only improve
for first words, and not for last words.
C# Raster - Modify Image Palette in C#.NET Read: PDF Image Extract; VB.NET Write: Insert text into PDF; edit, C#.NET PDF pages extract, copy, paste, C#.NET to reduce the size of the picture, especially in
how to copy an image from a pdf file; how to cut a picture from a pdf document C# Word - Document Processing in C#.NET Get the main ducument IDocument doc = document.GetDocument(); //Document clone IDocument doc0 = doc.Clone(); //Get all picture in document List<Bitmap> bitmaps
how to copy pdf image to jpg; how to copy picture from pdf to word
23
4. Discussion
Through these simulations, we have made several interesting discoveries. First, though
none of our constrained learners out-performed the best ideal learner (the bigram learner) on all
measures, our constrained learners still were able to extract statistical information from the
available data well enough to out-perform learners that segment by tracking transitional
probability. Since transitional probability strategies have historically been strongly associated
with the idea of “cognitively plausible statistical learning” in models of human language
acquisition (e.g., Saffran et al., 1996; Saffran, 2001; Perruchet & Desaulty, 2008; Pelucchi, Hay,
& Saffran, 2009), our result underscores how statistical learning can be considerably more
successful than is sometimes thought when only transitional probability learners are considered.
In addition, our online Bayesian learners also out-performed several recent statistical models of
word segmentation with respect to identifying a reliable lexicon, while performing comparably at
token and word boundary identification. Our results suggest that even with limitations on
memory and processing, a learning strategy that focuses explicitly on identifying words in the
input and optimizing a lexicon (as all our learners here do) may work better than one that focuses
on identifying boundaries (as transitional probability learners and some recent statistical learning
models do).
Second, we discovered that a bias that was helpful for the ideal learner – to assume words
are predictive units – is not always helpful for constrained learners. This suggests that we must
be careful in transferring the solutions we find for ideal learners to learners who have constraints
on their memory and processing the way that humans do. In this case, we speculate that the
reason some of our constrained learners do not benefit from the bigram assumption has to do
with the algorithm’s ability to search the hypothesis space; when tracking bigrams instead of just
21
individual words, the learner’s hypothesis space is much larger. It may be that some constrained
learners do not have sufficient processing resources to find the optimal solution (and perhaps to
recover from mistakes made early on). However, not all constrained learners suffer from this.
There were constrained learners that benefited from the bigram assumption, which suggests less
processing power may be required than previously thought to converge on good word
segmentations. In particular, if we examine the DMCMC learner, we can decrease the number of
samples per utterance to simulate a decrease in processing power. Table 7 shows the F-scores by
word tokens for both the unigram and bigram DMCMC learner with varying samples per
utterance. Though performance does degrade when processing power is more limited, these
learners still out-perform the best phonemic transition probability learner variant we identified
(which had scores around 38 for word tokens), even when sampling only 0.057% as much as the
ideal learner. Moreover, the bigram assumption continues to be helpful, even with very little
processing power available for the DMCMC learner.
[Put Table 7 approximately here: Performance on test set 1 for DMCMC learners]
If we constrain the ideal learner so it can only sample as often as the DMCMC learner
does, we find that the unigram learner’s segmentation performance is not quite as good as the
DMCMC unigram learner’s (see Table 8), though the bigram learner is much closer to (and in
the case of the lexicon scores, better than) the DMCMC bigram learner. One could imagine that
the DMCMC learner scores so well because the DMCMC algorithm is simply more efficient
than the Gibbs sampler used by the ideal learner -- i.e., given the same number of total samples,
DMCMC is able to find a higher probability segmentation than the Gibbs sampler. According to
20
the results in Table 9, however, this is not the case: even when achieving higher segmentation
scores, the DMCMC learner still finds a segmentation that actually has a lower posterior
probability than its ideal learner counterpart. So it is not that the DMCMC learner is better at
finding optimal solutions than the ideal learner – instead, it appears that some solutions that are
sub-optimal with respect to posterior probability are actually better than those “optimal”
solutions with respect to segmentation performance measures. This suggests that there could be
something gained by the DMCMC learner’s method of approximated inference if we are more
interested in good segmentation performance (to be discussed further below).
[Put Table 8 approximately here: Performance on test set 1 for DMCMC learners and ideal
learners that only sample as much as the DMCMC learners do.]
[Put Table 9 approximately here: Posterior probability vs. segmentation performance on test set 1
for the Ideal and DMCMC learners.]
Turning to the more general comparison of the ideal learner to the constrained learners,
we made a surprising discovery – namely that some of our constrained unigram learners out-
performed the ideal learner. This is somewhat counterintuitive, as one might naturally assume
that less processing power would lead to equivalent if not worse performance.
To rule out the possibility that these results are an artifact of this particular corpus, we
tested our learners on a larger corpus of English, the Pearl-Brent derived corpus available
through CHILDES (MacWhinney 2000). This corpus contains child-directed speech to children
between 8 months and 9 months old, consisting of 28,391 utterances (96,920 word tokens, 3,213
24
word types, average words per utterance: 3.4, average phonemes per word: 3.6). In Table 10, we
report the learners’ performance on five test sets generated from this corpus (these were
generated the same way as the ones from the Bernstein-Ratner corpus were). The same
surprising performance trend appears, where the DMCMC unigram learner is out-performing the
Ideal unigram learner – though only with respect to tokens and word boundaries, and not with
respect to lexicon items.
[Insert Table 10 approximately here: Average performance of different learners on five test sets
from the Pearl-Brent derived corpus. ]
We subsequently looked at the errors being made by both the ideal and the DMCMC
unigram learners on these English corpora, and discovered a potential cause for the surprising
behavior. It turns out that the ideal learner makes many more undersegmentation errors on
highly frequent bigrams consisting of short words (e.g., can you, do you, and it’s a segmented as
canyou, doyou, and itsa) while the DMCMC learner does not undersegment these bigrams.
When the DMCMC learner does make errors on frequent items that are different from the errors
the ideal learner makes, it tends to oversegment, often splitting off sequences that look like
English derivational morphology, such as “-s” (plural or 3
rd
sg present tense) and “-ing”
(progressive) (e.g., ringing segmented as ring ing, and flowers segmented as flower s
). If we
survey the errors made by each learner for items occurring 7 or more times in the first test set of
each English corpus and which are not shared (i.e., only one learner made the error), we find the
DMCMC learner’s additional errors are far fewer than the ideal learner’s additional errors (as
show below in Table 11).
22
[Insert Table 11 approximately here: Analysis of unshared errors made by the ideal and
DMCMC unigram learners for items occurring 7 or more times in the first test set of each
corpus.]
Why might this particular error pattern occur? A possible explanation for this error
pattern is related to the ideal learner’s increased processing capabilities. Specifically, the ideal
learner is granted the memory capacity to survey the entire corpus for frequency information and
update its segmentation hypotheses for utterances occurring early in the corpus at any point
during learning. This allows the ideal unigram learner to notice that certain short items (e.g.,
actual words like it’s and a) appear very frequently together. Given that it cannot represent this
mutual occurrence any other way, it will decide to make these items a single lexical item;
moreover, it can fix its previous “errors” that it made earlier during learning when it thought
these were two separate lexical items. In contrast, the DMCMC learner does not have this
omniscience about item frequency in the corpus, nor as much ability to fix “errors” made earlier
in the learning process. This results in the DMCMC learner leaving these short items as
separate, particularly when encountered in earlier utterances. As they then continue to exist in
the lexicon as separate lexical items, undersegmentation errors do not occur nearly as much.
In summary, more processing power and memory capacity does appear to hurt the
inference process of the ideal unigram learner, even if that learner identifies a segmentation with
a higher posterior probability. This behavior is similar to Newport (1990)’s “Less is More”
hypothesis for human language acquisition, which proposes that limited processing abilities are
advantageous for tasks like language acquisition because they selectively focus the learner’s
attention. With this selective focus, children are better able to home in on the correct
Documents you may be interested
Documents you may be interested