48
54
Mapping is related to clustering or classification exercises, where the systems involved take the
document clusters or classes and arrange them in 2-dimensional space by considering the similarity
of the documents relative to one another over the entire collection. Documents that share elements in
common are placed closer together spatially, while ones with less similarity are placed further away.
The FAQ section on the IN-SPIRE tool
61
, a related cousin of the ThemeScape tool, both originally
developed at Pacific Northwest National Laboratories, provides the following explanation of the
process used for creating spatial maps:
In brief, IN-SPIRE™ creates mathematical representations of the documents, which are then
organized into clusters and visualized into "maps" that can be interrogated for analysis.
More specifically, IN-SPIRE™ performs the following steps:
• The text engine scans through the document collection and automatically determines the
distinguishing words or "topics" within the collection, based upon statistical measurements of
word distribution, frequency, and co-occurrence with other words. Distinguishing words are
those that help describe how each document in the dataset is different from any other
document. For example, the word "and" would not be considered a distinguishing word,
because it is expected to occur frequently in every document. In a dataset where every
document mentions nanotech, "nanotech" wouldn't be a distinguishing word either.
• The text engine uses these distinguishing words to create a mathematical signature for each
document in the collection. Then it does a rough similarity comparison of all the signatures to
create cluster groupings.
• IN-SPIRE™ compares the clusters against each other for similarity, and arranges them in
high-dimensional space (about 200 axes) so that similar clusters are located close together.
The clusters can be thought of as a mass of bubbles, but in 200-dimensional space instead of
just 3.
• That high-dimensional arrangement of clusters is then flattened down to a comprehensible 2-
dimensions, trying to preserve a picture where similar clusters are located close to each other,
and dissimilar clusters are located far apart. Finally, the documents are added to the picture by
arranging each within the invisible “bubble” of their respective cluster.
Spatial concept maps can also be made using classification methods. Arguably, the most famous of
these is the Kohonen Self Organizing Map (SOM):
Kohonen Self Organizing Maps – a type of artificial neural network (ANN) that is trained using
unsupervised learning to produce a low dimensional (typically two-dimensional), discretized
representation of the input space of the training samples, called a map. Self-organizing maps are
different from other artificial neural networks in the sense that they use a neighborhood function to
preserve the topological properties of the input space
62
.
For additional discussion on the use of spatial concept maps in patent analytics please see the
following blog post on the subject:
http://www.patinformatics.com/blog/machine-learning-in-patent-analytics-part-3-spatial-concept-maps-
for-exploring-large-domains/
61
http://in-spire.pnnl.gov/faq_7.stm
62
http://en.wikipedia.org/wiki/Self-organizing_map