43
65
Chapter 7: Frameworks Associated with Performing Patent Analytics and Patent Landscape
Reports
When developing a patent analytics output, there are certain fundamental ideas, or philosophies,
associated with conducting the work. These thoughts apply generally to almost any task associated
with patent analysis and are provided at a higher level of abstraction than the tips and instructions
associated with specific analysis. They are referred to as frameworks since, as with a building
scaffold, they provide a foundation on which more detailed and specific analysis tasks can be
attached. In this chapter various frameworks, or general principles associated with patent analytics,
and landscapes are discussed.
7.1 – Content Types for Conducting Analysis
The vast majority of analytics projects can be broken into two categories; those that work with data in
the form of exact strings in structured fields, and those that work with text or semantic content that is
unstructured. Analytics utilizing these different sources of content are referred to as data mining or
analytics, and text mining or analytics respectively. Some data mining purists consider text as simply
a more complicated form of data, but for the purposes of this discussion they will be separated since
the methods used to work with exact strings in data mining is different than how text is analyzed in
text mining.
7.1.1 – Data Mining
When analysts are working with numbers, or strings of letters divorced from context or semantic
meaning, they are performing data mining or analysis. The practitioner is generally conducting a
statistical analysis of a list of items to identify patterns within a collection.
In the most common form of this type of analysis, discrete items such as Patent Assignees or
Application Filing Years, are counted, and potentially ranked within a set. Tools, such as a Pivot Table
in Microsoft Excel, are looking for exact string matches to determine how frequently a specific value is
found. All analyses in this category are based on exact string matches so misspellings or
inconsistencies within a set will be counted as separate values. For instance, when working with
Patent Assignee Names, if there are entries in a data set for Vertex Pharmaceuticals and Vertex
Pharmaceuticals Inc., these will be treated as discrete items, and will be counted separately. Since
the collections of alphabetic strings do not match exactly the analysis will count these items
separately. Clearly, these items belong together, and as discussed in chapter 6.1, the process of
cleaning up or grouping these items together is required to ensure that an accurate count of the
appropriate values takes place.
Another important component of data mining or analysis is that the material being analyzed is
structured or fielded. When items are placed in a specific field, within a database, or when they are
found in a specific column, in a spreadsheet, they are considered structured or fielded. This implies
that they belong to a category of one type or another. Inventor Names, for instance, are recognized as
being data of a particular type and are segregated and organized so they are collected together in one
place for analysis. Additional names might be found within a document but the analyst can distinguish
these other names from the Inventor Names since the later has been structured by being collected
into a specific field.
Data mining is thus characterized as the analysis of exact string matches contained in structured or
fielded databases or collections.
43
66
7.1.2 – Text Mining
Linguistic content is distinct from data strings for a variety of reasons. To start with, there is the
concept of language, where a word can mean the same thing but is spelled differently. There is also
the concept of context where the same word is used but it has a different definition depending on how
it is used. Similarly, parts-of-speech can be considered when analyzing text data, where a word can
be used as a verb in some cases and as a noun in others.
Raw text is generally considered to be unstructured or semi-structured since the content is not
organized into categories. According to Wikipedia, unstructured text
73
refers to information that either
does not have a pre-defined data model and/or does not fit well into relational tables. A patent
abstract would be an example of a semi-structured item since the patent abstract is a field in most
databases, and there is an expectation on what type of content will be found there. The claims, on the
other hand, while representing the legally binding portions of the text, can be very long and deal with a
variety of concepts, and by nature is not structured into discrete items.
The methods and means for analyzing linguistic content, due to the complicated nature of the source,
is very different than working with data, so it is important to consider, and understand them
individually. As an example, the following series of steps might be conducted in order to prepare a
collection of unstructured text for analysis:
• Tokenization – explaining to the computer where one word ends and another begins
• Stemming – removing common suffixes and prefixes from words to generate the root of a word
for subsequent use
• Part-of-Speech Tagging – identifying words as nouns, verbs or adjectives
• Entity Tagging – using lists of items or linguistic rules to identify a token as a type or person,
organization or other type of object
• Term Filtering – reducing the number of terms or objects to be analyzed by removing
stopwords (non-content bearing terms), or frequently or infrequently applied terms, in a corpus
or collection
In general, when conducting analytics associated with generating PLRs, the analyst needs to
understand whether a data or text-based method is being performed. Since the methods involved are
quite different, optimal results will depend on the analyst understanding the different approaches and
applying them properly.
7.2 – Data Scale for Conducting Analysis
In addition to thinking about data collections based on their content, exact strings vs. raw text, data
scientists also tend to think about data in terms of the size of the collection they are working with.
Generally, this is done since different methods are used depending on how large the collection of data
being worked with will be. Most analysis that end up in PLRs are concerned with larger data sets,
these are being conducted on a macro-level, but occasionally it is necessary to provide more detailed
examinations of small subsets, a micro-level analysis.
7.2.1 – Macro-Level Analysis
73
http://en.wikipedia.org/wiki/Unstructured_data
46
67
Also referred to as a global-level view, analysis at this scale if being performed for health care or other
socially related data collections, would be done on a population-wide level. In the area of patent
analytics, macro-level data sets contain greater than 10,000 records being studied. Since PLRs are
generally broad overviews of a topic area, most of the analytics that go into them are conducted on
the macro scale. When working with macro-level collections there is a greater reliance on
computational methods due to the amount of time and effort it would take to analyze sets of this size
manually.
7.2.2 – Meso-Level Analysis
Sometimes referred to as a local-level review, analysis at this scale if being performed for social data,
would be done on a group-wide level. Thinking about patent analysis projects, meso-level data sets
contain between 1,000 and 10,000 records. Many of the same methods used for macro-level analysis
will also be used with meso-level collections since sets this large are difficult to manage when records
are looked at individually. The computing resources and time required to perform these analysis are
going to be less than what is done on the macro-level. Many PLR analytics are conducted on this
level when sub-collections within a broader topic area are explored. The practice of working with
subsets of a larger whole is sometimes referred to as “drilling into” a data set.
7.2.3 – Micro-Level Analysis
Sometimes referred to as the individual level, analysis is generally conducted on a one-on-one basis.
Thinking about patent analysis projects, micro-level data sets contain less than 1,000 records and
frequently is done on collections of less than 100 documents. Many of the analysis done on this level
are done manually and in circumstances where a high degree of precision and human ingenuity is
needed to ensure a meaningful result. In work associated with PLRs, detailed analysis of this type is
performed in order to confirm trends and associations discovered while conducting macro or meso-
level analysis. This is especially the case when counter-intuitive results are obtained during larger
scale analytics, and the analysis wants to better understand the cause of these trends. Certain
activities related to PLRs, such as patent valuation, is often best done on a case-by-case basis.
7.3 – The Linear Law of Patent Analysis
The Linear Law of Patent Analysis was proposed as a framework for performing patent analytics
projects in 2002
74
. It was originally developed to assist practitioners in understanding the importance
of starting an analysis by investigating the needs of the customer for the analytics, as opposed to
simply jumping in with an analysis tool. It has since been used as a general method for planning
analysis projects. The steps in the process are:
• Create a toolkit of analysis tools
• Understand the business need and the need behind the need
• The need drives the question
• The question drives the data
• The data drives the tool
This is referred to as a linear law since in this framework the steps have to be followed in order to
provide the best results. Often companies or analysts would start with the purchase of the tool and
once that was accomplished, since a significant investment had been made in the tool, they would
use it exclusively to conduct all of their analysis projects. In the suggested framework the choice of
74
http://www.infotoday.com/searcher/oct02/trippe.htm
40
68
which tool to use is left as the last step once all of the other parameters associated with the analysis
have been worked out.
The law starts with gathering a collection of tools or a toolkit. There is no one tool that can work with
all sorts of data, and can conduct all types of analysis, so it is important for the analyst to have
options. Some projects require semantic or linguistic analysis of text, others require the study of
citation patterns and networks, and others still require studying the changes that take place within the
text of a patent as it progresses through its life cycle. So within reason, given budget constraints, a
suite of tools should be collected.
The next step speaks to understanding the business requirements that will be satisfied by conducting
the analysis. Under ideal circumstances, the analyst should know precisely what decision a business
leader would be making with the analysis provided. They should also have a good idea about the
situation the organization finds itself in, why there is an issue with it, and have some idea how a
preferred path forward might look. Analytical results should be told as a narrative to have the greatest
impact with the decision maker, and understanding all of the context will allow the analyst to craft their
results into a compelling story that drives decision-making.
Only after the needs are thoroughly understood can the analyst start suggesting questions, and
potentially hypothesis that should be explored during the course of the project. The questions at least
can be confirmed with the decision maker, and provides confidence that the analyst understands the
needs and is thinking about ways to address them. Depending on the needs either one or several
questions can be addressed.
Now that the questions have been established, the experiments can be developed that will either
confirm or discredit the hypothesis associated with them. In the case of patent analytics, experiments
are designed by considering the data that will be analyzed.
Finally, now that all of the other details have been worked out a decision can be made on which tool
will provide the proper insight into the appropriate data to either support or dismiss the hypothesis.
The use of the right tool is often critical to success as an analyst but their application must be applied
under the proper circumstances to provide critical insight.
Additional background on the history of the Linear Law of Patent Analysis can be found at
http://www.patinformatics.com/blog/the-linear-law-of-patent-analysis-revisited/.
7.4 – Precision and Recall
Information retrieval or searching effectiveness is traditionally described in terms of two measures,
recall and precision. These items are defined as:
• Recall – how much of the useful information has my search retrieved?
• Precision – how much of the information that I have retrieved is useful?
There is also a useful probabilistic interpretation of recall and precision: recall estimating the
probability that a relevant document will be retrieved in response to a query and precision estimating
the probability that a retrieved document will be relevant.
Thinking about the issues in searching during the preparation of a PLR, information retrieval methods
usually look at precision and recall simultaneously and measure their methods by how techniques
43
69
stack up against both elements. Even though this is the case, precision and recall are normally
opposed to one another such that with an increase in recall there is usually a subsequent drop in the
level of precision. Generally speaking, as searches are designed to maximize recall, the results can
suffer since more off-topic references get included in the collection.
In generating collections for PLRs it might be more productive to begin with creating sets using
methods that produce high recall exclusive of precision. When statistical analysis is performed on
large or macro-level sets only major trends or items that appear frequently are going to be seen.
Precision, in this instance, can suffer to some degree with these types of searches, since minor
occurrences within these sets will not be seen in the larger context. This can often be evaluated by
examining several of the significant trends to ensure that they are coming from reasonably precise
references. If this is the case then it is general accepted to sacrifice some precision for the sake of
recall.
7.5 – General Skill Set Requirements for Analysts
Proficiency as a patent analyst requires a collection of skill sets on the part of the individual
performing the task. At a minimum a patent analyst should have experience in the following areas:
• Patent information – due to the idiosyncrasies and nuances of patent data it is critical that
people who understand this collection intimately be the ones conducting the analysis. Patent
information, perhaps more than almost any other data source, can be misinterpreted if the
analyst is not familiar with the history and details of it.
• Data analysis and statistics – while most analysis tools and methods are semi-automated and
don’t require adjustments on the part of the analyst, optimal results are obtained when the
practitioner thoroughly understands the variables and parameters associated with an analysis
and can modify them as needed. The results of an analysis are also easier to understand and
explain to a client when the analyst knows the method and how they manipulated it.
• Legal knowledge – while formal accreditation, such as passing a patent bar, is not required, a
general understanding of the legal aspects of the patent system, especially in a worldwide
context is certainly helpful. This is especially the case if interpretation of claim language is
requires to conduct a PLR. Legal perspective is also useful for understanding patent families
and how they relate to various national patenting systems.
• Presentation skills – one of the key features of a PLR is its ability to collect a large amount of
information and provide a concise report of the key trends and observations in the area being
studied. The ability to organize large amounts of data into a compelling story and present the
results in an engaging fashion tailored to the learning style of the potential readers is essential
to obtain maximum impact.
• Deductive ability – the launch of each PLR is a blank page with an open question that needs to
be investigated. Looking at each project as a new mystery to solve, with its own unique
challenges and outcomes, is required. Individuals who enjoy intellectual puzzles and
discovering and exploring new topic area typically enjoy the deductive reasoning aspects of
the analyst position.
While it is not necessary for a beginning analyst to have all of these skills as they get started, since
many can be developed as they train, there should at least be an aptitude and interest in gaining all of
them.
Documents you may be interested
Documents you may be interested