Network analysis can be a very powerful visualization technique but it has not fully caught on with the
majority of vendors who make tools in the patent analysis space. Hopefully, more examples of this
type of functionality will be available in the future.
8.6.2 – Spatial Concept Maps
Spatial concept mapping, is related to clustering, or classification, since it generally begins with one of
these methods, but adds an extra component, identification of relative similarity between the
categories created, to the task. The tools involved take the document clusters, or classes, and
arrange them in 2-dimensional space by considering the similarity of the documents, or clusters,
relative to one another, over the entire collection. Documents that share elements in common are
placed closer together spatially, while ones with less similarity, are placed further away This analysis
task was introduced in section 6.5. Using layers in conjunction with spatial concept maps was covered
in section 6.6. Most spatial concept maps begin with a clustering, or unsupervised machine learning
step, which was covered in section 6.4.
Since there seems to be an X and Y-axis on most maps, many users think these visualizations
behave like a scatterplot, where extrapolating between the X and Y can identify empty spaces on the
map. In reality, there are no X and Y-axis associated with the maps and the distance between
documents, usually represented by dots, are based on similarity of the documents to one another and
compared to all of the other documents in the collection. Distance therefore, is relative, based on the
document collection and guesses cannot generally be made about what sort of document might
occupy an empty space on the map.
While the maps, and document organization, is provided in two-dimensions a third-dimension is often
implied by incorporating document density. The tightness of the clustering, in a group, or the number
of documents, found in the group, will be used to demonstrate which groups have the highest number
of documents in them. On a topographical version of spatial maps this is represented by an implied
increase in peak heights on the map, visualized using a change in color. Many of the spatial maps,
especially the ones based on clustering methods, also provide contour lines on the diagrams.
Generally, these lines are drawn based on the distance between the document dots. The distance
between a dot and its nearest neighbor determines the boundaries of the lines. Once the threshold is
exceeded the line is drawn between the two dots. It has been speculated that contour lines
encompassing multiple groups on a map implies a relationship between these groups, but generally,
this is not the case and the lines are simply based on the spread of the documents.
There are a few keys to creating good spatial concept maps that will be more easily interpreted by
clients. The first involves the choice of words used to generate the vector that will be compared
between documents. When working with full-text patent documents an analysis of this type should be
restricted to certain sections of the document, such as the claims, or the titles and abstracts. Working
with the entire body of text can confuse the system since there are sections, such as the background
of the invention that are talking about other inventions, as opposed to the one covered by the patent.
In addition, when working with full-text, the words chosen by the algorithm creating the vector will
likely be sub-optimized since there are so many words to choose from.
Users can selectively add stopwords to their map settings. Stopwords are also referred to as non-
content bearing words, and they can adversely impact clustering results if they are included in the
vector since they do not impart knowledge of the topic area. Almost all mapping tools come with a list
of standard stopwords, such as “the”, “and”, “a”, and other non-content bearing terms, but users can
also look at initial results and identify other words that do not add meaning to the technology being