with hedonic browsing (see also Bucklin et al. 2002). We focus on exploratory search, or
hedonic browsing, in the context of search for content.
Many content websites attempt to guide consumers’ exploration processes, e.g., to improve
users’ experience of the site, or to provide exposure to certain types of content. Generally,
websites guide consumers by exposing them to recommendations generated by designated
algorithms. Given the recent advances in recommendation algorithms (e.g. Adomavicius and
Tuzhilin 2005; Herlocker et al. 2004), why would user-generated links (and dual network
structures) contribute to the process of exploration over a product network? The answer may be
found through analogy to the social network. Recent marketing literature connects network
structure properties to information dissemination in social networks (e.g., Goldenberg, Libai,
and Muller 2001; Goldenberg et al. 2009; Katona, Zubcsek, and Sarvary 2010; Shaikh,
Rangaswamy, and Balakrishnan 2006; Trusov, Bucklin, and Pauwels 2009; Valente 1996; Van
den Bulte and Wuyts 2007). Specifically, the informative advantage of weak ties and structural
holes in social networks has been demonstrated repeatedly (starting from the seminal works by
Burt (1992) and Granovetter (1973)). In principle, individuals who have strong ties usually
possess similar information and have little to contribute to each other (e.g., in a job search
scenario as in Granovetter, 1973). By contrast, individuals with weak ties belong to different
social circles, and therefore information they transmit to one another is more likely to be new.
In the context of the product network, recommendation systems are likely to link content
items that are broadly similar to one another, with relatively low variance. This occurs for two
reasons. First, such systems are heavily built on data that come from directed (keyword) search.
When an individual searches for specific content (e.g., “Seinfeld episode”), her subsequent
searches are likely to be for similar content (e.g., more episodes of Seinfeld, interviews with the
actors etc.). Second, the recommendation system aggregates the choices of the entire population,
attempting to achieve a high overall likelihood of satisfaction. The intersection of such
preferences is likely to exhibit relatively low variety in content. The result of these two
mechanisms is that, in most cases, the algorithm recommends products that are similar to one
another. For example, we find that on YouTube, 56% of the product network links connect
videos within the same category.
In social network research, frequency of communication (e.g., number of phone calls or the
number of joint posts) is often used as a proxy for tie strength. Applying this logic to the product
network setting, we can consider recommendation links as strong ties between products, given
that these links are based on the preferences of the majority (for example, being frequently co-
viewed or co-tagged). We explore and confirm this premise in the empirical part of this paper.
As discussed above, during the exploration process, and especially as time elapses, strong ties
may cease to be interesting or informative because they point to similar content (e.g., videos or
books). To branch out from the close circle of similar content, weaker ties are required.
These weak ties are provided by the dual network structure. Just as, according to social
networking theory, an individual can bridge across several social circles, we can consider each
individual user of a website as being situated in the intersection of different types of product
circles (for example, different genres or categories). The typical user does not post links to
identical content types but rather presents a variety of content to reflect his or her different
preferences. For example, a user may have favorite funny commercials as well as favorite
educational videos, which have little in common. Thus, users will often bridge across products
of different circles based on their preferences (for example, on YouTube, less than 20% of a
given user's generated links connect products of the same category). The resulting user-
generated links function as the weak ties of the product network. As content exploration is an
ongoing process, which often includes seeking new information, it is very possible that users
will be able to benefit from the ability to observe weak ties (user-generated links), together with
the strong ties (algorithm-induced recommendations).
To the best of our knowledge, our work is the first to study the dual network often present
in electronic commerce sites. Work on product networks includes a study of the network of
networked content sites by Katona and Sarvary (2008), who investigated how to strategically
link between Web sites in a market of advertising links. However, product networks were not
explicitly mentioned in that study. Oestreicher-Singer and Sundararajan (2011) studied the
network of books on Amazon.com and quantified the incremental correlation in book sales
attributable to product networks' visibility. We propose that the integration of social and product
networks will facilitate exploration of content, and that the exploration process is more efficient
in a dual-network structure that incorporates user-generated links than in a regular product
network based on algorithmically induced recommendations.
OVERVIEW OF DATA
Using data from YouTube.com, one of the largest existing dual networks, we conduct an in-
depth analysis of the dual-network structure. YouTube’s core business is centered around
videos, which are the website’s “products”. Each video has an associated webpage that is
connected by links to other videos’ webpages, thus creating a product network. In addition to the
product network, YouTube offers a social network in which each user has an associated
webpage (Figure 3 presents a sample product page and a sample user page); these webpages can
be linked to other user pages (creating a social network) and to video pages (connecting the
product and social networks). This creates a dual network structure (see Figure 2 above).
Using a crawler we collected data on the YouTube product network and social network. We
have collected data for approximately 700,000 videos and for 50,000 users connected by
approximately 10 million links. The links include algorithm-generated links between videos
based on co-consumption (labeled "Related Videos"), social links between user pages
("Friends"), and user-generated links between products and users ("Owner" and "Favorites").
(Insert Figure 3 about here)
Data were collected using snowball network sampling, which is a common technique used
in large network sampling (Ahn et al. 2007; Carrington, Scott, and Wasserman 2005;
Wasserman and Faust 1994). Specifically, we used a breadth-first algorithm starting with the 25
most viewed videos on YouTube.com, following each video's links to its owner and to its
related videos. We then followed user friend links as well as related-video links, and continued
up to 4 hops from the source (fourth neighbor). At the fourth level of the data collection, we
collected only outgoing links of nodes (videos) to other nodes already in our dataset. Incomplete
information is a common problem when sampling a network of this size, especially regarding
the outer edges of the network. By using this method, which collects data about the links of
these outer nodes to inner nodes in the sampled network, we reduce the level of incompleteness.
THE TOPOLOGY OF DIFFERENT NETWORK STRUCTURES
The first step in our empirical investigation required in-depth analysis of the dual network
and a comparison of this structure with the following two alternative network structures:
The product network. To create an example of a product network, we extracted only the
product nodes and the links between them from the above mentioned data set. To make sure all
the networks we considered were of comparable sizes, and since there were 50,000 user nodes in
the dual network, in the studies described below we used a reduced version of the dual network,
which we generated by eliminating 50,000 randomly chosen product nodes from the outer edge
of the original dual network.
The synthetic dual network. As suggested, the dual-network structure may facilitate a more
efficient exploration process. It is therefore important to understand whether this is due to the
unique information carried over user-generated links, or whether this is simply a result of
offering a more diverse set of options, in which case the same benefits could be achieved easily
by implementing a more advanced recommendation algorithm. The literature on network
structure (Newman, 2003) suggests that adding random links reduces the average distance
between products in a network. Such rewiring can therefore potentially assist the exploration
process as well. To be able to compare such a network to the dual network, we constructed a
synthetic dual network. This network is based on the YouTube product network mentioned
above (the reduced network with 50,000 user nodes eliminated), with 50,000 artificially created
dummy-user pages, each randomly connected to different products, according to the degree
distribution of the real user nodes.
Using the three networks (the YouTube dual network, the YouTube product network and
the synthetic dual network), we computed several indices that are commonly used in the
literature to characterize network structures and effectiveness (Newman 2003; Wasserman and
Faust 1994), including the following:
1. The degree (number of links) of each node, including the indegree (the number of
incoming links) and outdegree (the number of outgoing links).
2. Closeness centrality of each node: a measure of the average minimal distance (number of
hops) between this node and any other node in the network.
3. Betweenness centrality of each node: a measure of the number of shortest paths (between
any two nodes) in which this node is included.
4. PageRank of each node (Brin and Page 1998): an iterative measure of centrality, which
is based on the number of links pointing to a node and the centrality of those links.
5. Assortative mixing level of each node: a measure of the level of similarity (homophily)
between a given node and its neighbors (Newman 2003).
Note that we treated all types of nodes as one integrated network when computing the
indices. Table 1 shows our findings for the YouTube dual network and for the synthetic dual
network, separated according to the types of nodes.
Looking at the YouTube dual network (the two left columns of Table 1), it is perhaps
surprising that even when both types of nodes are included in one combined network, they have
notably different structural properties. Most importantly, compared with video (product) nodes,
user nodes in the YouTube dual network have a significantly higher betweenness
(approximately three times higher). This observation suggests that user pages (and their
associated user-generated links) play an important role in increasing network connectivity.
Documents you may be interested
Documents you may be interested