to allocate greater visual attention to a matching referent, however this partial
knowledge may be inadequate to elicit a haptic response toward the referent,
because of the additional effort involved in executing an action. Importantly
this is not to say that the action system is delayed, because more robust
representations can result in accurate reaching. This approach proposes that
knowledge measured by looking and reaching are not necessarily tapping into
identical representations but are actually gauging different levels of knowledge.
Examining the online dynamics of visual and haptic responses is crucial to
determining the level of representation required to generate these responses.
This, in turn, is pivotal for understanding how the developing cognitive system
The study of early language comprehension presents a particularly ripe area
within which to investigate these dynamics. At present there are three primary
paradigms in use for the assessment of early comprehension vocabulary. These
paradigms utilize parent report, haptic responses, and visual attention.
Moreover gaining a detailed understanding of how early lexical knowledge
transitions to more explicit knowledge states is crucial because it has been well
documented that infants who demonstrate both delayed language comprehension
and production are at the greatest risk for continued language delay, and later
development deficits (Karmiloff-Smith, 1992; Desmarais, Sylvestre, Meyer,
Bairati, & Rouleau, 2008; Law, Boyle, Harris, Harkness, & Nye, 2000).
Most of what we currently know about the utility of visual and haptic
responses as measures of early language abilities are from studies that have been
conducted in a piecemeal fashion, in which investigators selectively use either
looking time (Behrend, 1988; Fernald & McRoberts, 1991; Fernald, Zangl,
Portillo, & Marchman, 2008; Golinkoff, Hirsh-Pasek, Cauley, & Gordon, 1987;
Hirsh-Pasek & Golinkoff, 1996; Houston-Price, Mather, & Sakkalou, 2007;
Naigles & Gelman, 1995; Reznick, 1990; Robinson, Shore, Hull Smith &
Martinelli, 2000; Schafer, 1998; Thomas, Campos, Shucard, Ramsay &
Shucard, 1981) or haptic response (Bates et al., 1988; Snyder, Bates, &
Bretherton, 1981; Woodward, Markman, & Fitzsimmons, 1994; Friend &
Keplinger, 2003; Friend & Keplinger, 2008; Friend, Schmitt, and Simpson,
2012) but not both. Results from these studies demonstrate that the relationship
between looking-time and parent report is highly variable, though they suggest
that visual attention may be more sensitive than parent report to newly acquired,
less robust word knowledge.
Many visually-based methods adopt global metrics such as overall looking
time, and historically more fine-grained, micro-level measures that assess speed
of processing, and pattern of visual attention have been ignored (Aslin, 2007).
However, within the last decade micro-level metrics have offered interesting
insights into underlying cognitive processes. The “looking-while-listening”
paradigm first outlined in Fernald, McRoberts, & Swingley (2001) has evolved
from the well-documented preferential looking procedure to an on-line measure
of saccades in response to speech. Eye movements are monitored by digital
camcorders and saccades are coded frame-by-frame to determine infants’ speed
in processing words. These continuous data yield a richer, more nuanced picture
of language processing than do dichotomous measures obtained by parent report
or macro-level looking time measures. To test the predictive validity of
measuring individual differences in speed of language processing, Marchman
and Fernald (2008) brought back a group of 8-year-old children tested using the
looking-while-listening procedure when they were 25-months-old. Their results
indicated that the speed with which the target word was processed (as measured
by visual reaction time) and the size of children’s lexicons at 25-months were
predictive of intellectual functioning and language skills at 8 years of age.
Researchers utilizing haptic response measures of early language have
obtained comparable findings to visually-based measures (Friend et al. 2003,
2008, 2012; Woodward et al. 1994). Friend and colleagues conducted a series of
studies investigating the predictive validity of the Computerized Comprehension
Task (CCT), a computer-based measure that uses touch response to gauge early
word comprehension. The score on CCT (proportion of correct touches to a
named visual referent) was found to be a reliable and valid measure of word
comprehension in the 2nd year of life, and a significant predictor of productive
language abilities in the 3
year. Additionally performance on the CCT was
significantly correlated with parent report on the MCDI: WG (Friend &
Keplinger, 2003; 2008; Friend, Schmitt, & Simpson, 2012). Despite the
predictive value of this measure, it suffers from a quandary that exists for all
measures that require a volitional response: does one interpret both incorrect and
absent responses as indicating a lack of knowledge, or do these two response
types differ in a predictable way?
To our knowledge there has been one study that has used both looking and
reaching as measures of early language. Using the Interactive Intermodal
Preferential Looking Paradigm (IPLP) Gurteen et al. (2011) investigated 13- and
17-month-old’s rapid word learning abilities. After being taught a new label for
a novel object, infants participated in two trial types: preferential-looking, and
preferential-reaching. Results showed that, when prompted with the novel label,
13- and 17-month-olds looked significantly longer at the target object but failed
to display recognition haptically. The authors suggest that the discrepancy
between results obtained visually and haptically are potentially due to the effort
involved in organizing and executing an action, which may divert attention away
from the target location. However this claim has never been directly assessed.
Indeed very few studies have examined how infants’ visual attention relates to
executing actions and alternatively how actions can influence visual attention.
Using a head-mounted eye-tracker Corbetta, Guan, & Williams (2012), followed
one child from 16-49-weeks-old and assessed the congruency between her visual
attention and reach location. Similar to previous findings, Corbetta and
colleagues found that initially vision and action exhibited a relatively low
correspondence but this steadily increased until 8-months-old. After 8-months,
looking and reaching became progressively independent such that the child did
not consistently direct her reach to the location where she looked longest. Thus
throughout development children become less reliant on visual cues to guide
reaching, however what drives dissociations between where infants look and
reach is still debated.
There is mounting evidence that looking and reaching may reflect different
levels of knowledge in diverse domains. Indeed the recent findings from
Gurteen and colleagues extend the debate about the meaning of looking and
reaching as measures of knowledge, to early word comprehension. Gurteen et
al. (2011) measured rapid word learning abilities, which by definition establish
less robust word-object knowledge, providing further evidence that looking is a
more sensitive measure of gauging partial representations. However Gurteen and
colleagues did not measure looking and reaching concurrently, thus the online
relationship between these modalities as measures of early language is still
unknown. The overarching goal of the current study is to assess the
bidirectional relationship between vision and action, and to evaluate the
underlying word representations that guide infants’ visual and haptic responses.
To determine the real-time dynamics between visual and haptic response we
used an intermodal word comprehension task, which allows moment-by-
moment analysis of looking and haptic behavior. To understand how vision
relates to haptic response we will analyze two different measures of visual
attention: one macro-level (look accuracy) and one micro-level (proportion of
gaze shifts). Moreover, by measuring visual dynamics vis-à-vis haptic behavior
we seek to determine what levels of representation guide correct, incorrect, and
Participants were drawn from a larger NICDH-funded, multi-institutional
longitudinal project extending the Computerized Comprehension Task (CCT) to
the prediction of language production and early literacy in three languages
(English, Spanish, and French). Forty-nine 16-18-month-old monolingual
English infants (mean age = 16.7) participated in the current study. Data was
collected for 52 participants; three were excluded due to fussiness. All
participants were exposed to at least 80% English, were full term, and had
normal hearing and vision.
The study was conducted in a sound attenuated room in the Infant and Child
Development Laboratory at San Diego State University (see Figure 1). A 3M
touch capacitive monitor was attached to an adjustable wall mounted bracket
that was hidden behind blackout curtains and between two portable partitions.
Two HD video cameras were used to record participants’ visual and haptic
responses. The eye-tracking camera was mounted directly above the touch
monitor and peeked out through a small opening in the curtains. The haptic-
tracking camera was mounted on the wall above and behind the touch monitor to
Where is the cow?
capture both the infants’ haptic response and the stimulus pair presented on the
touch monitor. Speakers were positioned to the right and left of the touch
monitor behind the blackout curtains.
Figure 1 The experimental setting.
Upon entering the testing room, infants were seated on their caregiver’s lap
approximately 30cm from the touch monitor, and just left of the experimenter.
Parents wore blackout glasses and noise-cancelling headphones to control for
parental influence during the task. The study began with four training trials
containing highly familiar noun pairs to insure participants understood the
nature of the task. During the training phase, participants were presented with
noun pairs and prompted by the experimenter to touch one of the images. If the
child failed to touch the screen after repeated prompts, the experimenter touched
the target image for them. If a participant failed to touch during training, the
four training trials were repeated once. Only participants who executed at least
one correct touch during the training phase proceeded to the testing phase.
During testing, each trial lasted until the child touched the screen or until seven
seconds elapsed at which point the image pair disappeared. If the participant
completed all 41 test trials they were presented with 13 reliability trials that
were a random sample of test pairs in the opposite side orientation. All image
pairs presented during training, testing, and reliability were matched for word
difficulty (easy, medium, hard), part of speech (noun, adjective, verb), and
visual salience (color, size, luminance). The experimenter started each trial
when the infant’s gaze was directed toward the touch monitor. For a given trial,
two images appeared simultaneously on the right and left side of the touch
monitor. The side the target image appeared was presented in pseudo-random
order across trials such that target images could not appear on the same side
more than two consecutive trials, and the target was presented with equal
proportion on both sides of the screen (Hirsh-Pasek & Golinkoff, 1996). Upon
presentation of the image pair, infants were prompted to touch one of the images
(target). The sentence frame for the prompt changed as a function of target
word part of speech (nouns: Where is the____? Touch____, adjectives: Which
one is ____? Touch____, verbs: Who is __ing? Touch __ing). Touches to the
target (cow in Figure 1), but not distractor (pig in Figure 1) touches, produced an
auditory reinforcement corresponding to the image (e.g., “moo”).
Videos of infants’ eye-movements, haptic response, and a waveform of the
experimenter’s prompts extracted from the eye-tracking video were synced and
coded frame-by-frame (33ms digital time-code) using Eudico Linguistic
Annotator (ELAN). Coding occurred in two passes. Coder 1 coded the onset
and duration of the target word in the initial prompt and the side of presentation.
Coder 2, blind to side of presentation, coded gaze and haptic behavior. Coding
for each trial began at the onset of the target word and continued until infants
executed a touch or the trial ran to completion. At each frame, gaze was coded
as: left fixation, right fixation, or away look. The haptic response was coded
starting at the frame in which the arm initiated its trajectory toward the screen
resulting in a touch and was terminated at the frame in which infants made
contact with the screen. The haptic response was coded as: Target touch
(unambiguous touch to the labeled referent), Distractor touch (a touch to the
unlabeled referent or both images simultaneously), or No Touch (no haptic
response). Gaze data were compiled in two variables: proportion of gaze shifts
3.1 Look Accuracy and Haptic Response
The accuracy measure was calculated by dividing looking time to the target
by the total looking time toward the screen on a given trial. To investigate the
relationship between look accuracy and haptic response we calculated average
look accuracy for the three different haptic types (Target, Distractor, and No
Touch). Using a one-way Analysis of Variance (ANOVA) there was a main
effect of haptic type F (2,47) = 103.796, p < .001 such that look accuracy
changed as a function of where, Target (M = .649, SD = .076) and Distractor (M
= .390, SD = .100), and whether a touch was executed, No Touch (M = .547, SD
= .090). Post hoc tests using Bonferroni corrections indicated that all pairwise
comparisons were significant.
Figure 2 Mean look accuracy by haptic type.
To assess whether eye-gaze alone is purchasing evidence of lexical
knowledge above and beyond what is gauged by the haptic measure, we
compared average look accuracy on No Touch trials to chance performance
(%50). Using a one-sample t test, results show looking to the target on No
Touch trials was significantly longer than expected by chance t(48) = 42.92, p <
.001. This lends support to the idea that looking is perhaps a more sensitive
measure of early word knowledge than is reaching.
3.2 Proportion of gaze shift and Haptic Performance
Number of discrete looks is a measure of infant visual attention, and is
associated with later intellectual functioning (Colombo, Mitchell, Coldren, &
Freeseman, 1991; Rose, Futterweit, & Jankowski, 1999). In prior research,
discrete looks were operationalized as the number of saccades between a
stimulus pair. In the present study, because trial lengths varied, we divided the
number of discrete looks by total looking time to obtain a proportion of gaze
shifts per trial. To determine if visual attention patterns change as a function of
haptic response type we calculated the proportion of gaze shifts for each haptic
type. A one-way ANOVA revealed a main effect of haptic type F (2,47) = 78.1,
p < .001.
Figure 3 Proportion of gaze shifts by haptic type.
Post hoc tests using Bonferroni corrections indicated that the mean
proportion of gaze shifts for No Touches (M = .000796, SD = .000211) was
significantly lower than for Target touches (M = .00116, SD = .00257) and
Distractor touches (M = .00125, SD = .000232). However Target and Distractor
touches were not significantly different.
The ability to recognize and reference the meaning of familiar words
gradually increases over the 2
year of life. Initially, understanding of familiar
words may require contextual cues to support recognition. Eventually stronger,
more symbolic representations of word-referent pairings must develop.
Consequently, the early lexicon likely consists of both weak (i.e., contextually-
dependent) and strong (abstract) word representations (Tomasello, 2003).
Although there is a rich literature on early language, few studies have examined
how assessments based on different response modalities relate especially in
terms of the level of knowledge that these modalities index. The fundamental
assumption of looking- and reaching-based methods is that visual and haptic
behaviors are proxies for underlying knowledge. However results using looking
and reaching to gauge early cognitive abilities occasionally conflict with haptic
responses appearing to be less sensitive than visual responses. This is the first
study to examine the online relationship between infant visual attention and
In the current study we analyzed the relationship between two measures of
visual attention, one macro-level (look accuracy) and one micro-level
(proportion of gaze shifts). We found that look accuracy significantly predicts
haptic behavior, such that infants looked longer to an impending touch location.
On No Touch trials looking to the target was significantly longer than expected
by chance. This finding is compatible with the view that looking and reaching
are not analogous measures of knowledge but in fact gauge different levels of
understanding; looking may be more sensitive than reaching to less robust
representations (Munakata et al. 1998, Munakata 2001, Munakata et al. 2003).
However, we found that visual fixation patterns, characterized by proportion of
Proportion of Gaze Shifts
gaze shifts, changed as a function of action; gaze shifts were more frequent on
trials in which a haptic response was performed relative to No Touch trials in
which images were passively viewed. Taken together macro-and micro-level
measures of visual attention present a somewhat discrepant view on the relation
between looking and reaching.
We offer two interpretations for the present results. One interpretation is
that these results reflect infants’ failure to comply with task demands during
haptically-based methods, and consequently haptic measures can systematically
underestimate early word knowledge. Look accuracy was greater during Target
and No Touch trials, and substantially reduced during Distractor touch trials.
This finding suggests that infants have no knowledge of the target referent
during Distractor touch trials, but do display knowledge during No Touch trials,
suggesting that looking-based measures are more sensitive than haptic-based
measures. The significantly greater proportion of gaze shifts during Distractor
touch trials may simply be a byproduct of coordinated eye-arm movements.
Indeed there is evidence from adult work to suggest that simultaneous arm
movements decrease fixation duration and increase speed of saccadic
movements (Epelboim, et al., 1997; Snyder, Calton, Dickenson, and Lawrence,
2002). However attractive it is to attribute the greater proportion of gaze shifts
during Distractor touches to action execution in general, there is currently no
research to suggest infants increase their saccadic rate during action execution.
Another interpretation is that proportion of gaze shifts are not promoted by
coordinated action, but instead reflect an attempt, albeit unsuccessful, to
reconcile the key features of the target referent relative to the distractor. It has
been shown that a more sophisticated attentional style, marked by shorter
fixation durations and a high proportion of gaze shifts is reliably predictive of
later intellectual functioning (Colombo et al. 1993, Fagan, 1981, Rose et al.
1999). Thus the more sophisticated attentional style demonstrated during
Distractor touches potentially represents weak knowledge of the target word, or
at the least a greater level of understanding than is present during No Touch
trials. From this view the reason look accuracy is poor during Distractor
touches is because infants are unsuccessful at reconciling the difference between
the two images and choose the wrong referent for the haptic response, which
guides their gaze to the intended touch location (the distractor), resulting in a
reduced look accuracy. Indeed some researchers have suggested that action
diverts infants’ attention from the target because of the additional demands
involved in planning, organizing, and executing an action (Gurteen et al., 2011).
Moreover a diversion to the distractor is more likely if knowledge of the target
referent is weak (Munakata 2001). By this account, greater look accuracy
during No Touch trials is not due to understanding the target word, but instead is
a result of the length of time infants are given to survey the objects. Trials in the
current study are longer than traditional looking-based paradigms (Hirsh-Pasek
& Golinkoff, 1986; Fernald et al. 2001; Houston-Price et al., 2007) because it
takes longer to plan, organize, and execute an action than to orient to a stimulus.
There is evidence to suggest that when infants comprehend a word they will
Documents you may be interested
Documents you may be interested