76
G
UIDE TO
P
ROTECTING THE
C
ONFIDENTIALITY OF
P
ERSONALLY
I
DENTIFIABLE
I
NFORMATION
(PII)
4-5
(rendered distinguishable) by using a code, algorithm, or pseudonym that is assigned to individual
records. The code, algorithm, or pseudonym should not be derived from other related information
55
about
the individual, and the means of re-identification should only be known by authorized parties and not
disclosed to anyone without the authority to re-identify records. A common de-identification technique
for obscuring PII is to use a one-way cryptographic function, also known as a hash function, on the PII.
56
De-identified information can be assigned a PII confidentiality impact level of low , as long as the
following are both true:
The re-identification algorithm, code, or pseudonym is maintained in a separate system, with
appropriate controls in place to prevent unauthorized access to the re-identification information.
The data elements are not linkable, via public records or other reasonably available external records,
in order to re-identify the data.
For example, de-identification could be accomplished by removing account numbers, names, SSNs, and
any other identifiable information from a set of financial records. By de-identifying the information, a
trend analysis team could perform an unbiased review on those records in the system without
compromising the PII or providing the team with the ability to identify any individual. Another example
is using health care test results in research analysis. All of the identifying PII fields can be removed, and
the patient ID numbers can be obscured using pseudo-random data that is associated with a cross-
reference table located in a separate system. The only means to reconstruct the original (complete) PII
records is through authorized access to the cross-reference table.
Additionally, de-identified information can be aggregated for the purposes of statistical analysis, such as
making comparisons, analyzing trends, or identifying patterns. An example is the aggregation and use of
multiple sets of de-identified data for evaluating several types of education loan programs. The data
describes characteristics of loan holders, such as age, gender, region, and outstanding loan balances. With
this dataset, an analyst could draw statistics showing that 18,000 women in the 30-35 age group have
outstanding loan balances greater than $10,000. Although the original dataset contained distinguishable
identities for each person, the de-identified and aggregated dataset would not contain linked or readily
identifiable data for any individual.
4.2.4 Anonymizing Information
Anonymized information
57
is defined as previously identifiable information that has been de-identified and
for which a code or other association for re-identification no longer exists.
58
Anonymizing information
applies de-identification methods, determines the risk is very small, and documents the justification. 45 C.F.R. § 164.514,
http://www.hhs.gov/ocr/privacy/hipaa/administrative/privacyrule/index.html
55
This is not intended to exclude the application of cryptographic hash functions to the information.
56
Hashing may not be appropriate for de-identifying information covered by HIPAA. 45 C.F.R. § 164.514 (c)(1) specifically
excludes de-identification techniques where the code is derived from the PII itself. Organizations should consult their legal
counsel for legal requirements related to de-identification and anonymization.
57
For additional information about anonymity, see: A. Pfitzmann and M. Hansen, A Terminology for Talking about Privacy by
Data Minimization: Anonymity, Unlinkability, Undetectability, Unobservability, Pseudonymity, and Identity Management,
updated 2009, http://dud.inf.tu-dresden.de/literatur/Anon_Terminology_v0.32.pdf
.
58
Based on the Common Rule, which governs confidentiality requirements for research, 15 C.F.R. Part 27. Some
organizations do not distinguish between the terms de-identified and anonymized information and use them interchangeably.
Additionally, the amount of information available publicly and advances in computational technology make full anonymity
of released datasets (e.g., census data and public health data) difficult to accomplish. For additional information, see:
American Statistical Association,
Data Access and Personal Privacy: Appropriate Methods of Disclosure Control
,
December 6, 2008, http://www.amstat.org/news/statementondataaccess.cfm
.
85
G
UIDE TO
P
ROTECTING THE
C
ONFIDENTIALITY OF
P
ERSONALLY
I
DENTIFIABLE
I
NFORMATION
(PII)
4-6
usually involves the application of statistical disclosure limitation techniques
59
to ensure the data cannot
be re-identified, such as:
60
Generalizing the Data
—
Making information less precise, such as grouping continuous values
Suppressing the Data
—
Deleting an entire record or certain parts of records
Introducing Noise into the Data
—
Adding small amounts of variation into selected data
Swapping the Data
—
Exchanging certain data fields of one record with the same data fields of
another similar record (e.g., swapping the ZIP codes of two records)
Replacing Data with the Average Value
—
Replacing a selected value of data with the average value
for the entire group of data.
Using these techniques, the information is no longer PII, but it can retain its useful and realistic
properties.
61
Anonymized information is useful for system testing.
62
Systems that are newly developed, newly
purchased, or upgraded require testing before being introduced to their intended production (or live)
environment. Testing generally should simulate real conditions as closely as possible to ensure the new
or upgraded system runs correctly and handles the projected system capacity effectively. If PII is used in
the test environment, it is required to be protected at the same level that it is protected in the production
environment, which can add significantly to the time and expense of testing the system.
Randomly generating fake data in place of PII to test systems is often ineffective because certain
properties and statistical distributions of PII may need to be retained to effectively test the system. There
are tools available that substitute PII with synthetic data generated by anonymizing PII. The anonymized
information retains the useful properties of the original PII, but the anonymized information is not
considered to be PII. Anonymized data substitution is a privacy-specific protection measure that enables
system testing while reducing the expense and added time of protecting PII. However, not all data can be
readily anonymized (e.g., biometric data).
4.3 Security Controls
In addition to the PII-specific safeguards described earlier in this section, many types of security controls
are available to safeguard the confidentiality of PII. Providing reasonable security safeguards is also a
Fair Information Practice. Security controls are often already implemented on a system to protect other
types of data processed, stored, or transmitted by the system. The security controls listed in NIST SP
800-53 address general protections of data and systems. The items listed below are some of the NIST SP
800-53 controls that can be used to help safeguard the confidentiality of PII. Note that some of these
59
Both anonymizing and de-identifying should be conducted by someone with appropriate training. It may be helpful, as
appropriate, to consult with a statistician to assess the level of risk with respect to possible unintended re-identification and
improper disclosure. For additional information on statistical disclosure lim
itation techniques, see OMB‘s Statistical Policy
Working Paper #22, http://www.fcsm.gov/working-papers/spwp22.html
. See also Census Bureau, Report on Confidentiality
and Privacy 1790-2002, http://www.census.gov/prod/2003pubs/conmono2.pdf.
60
The Federal Committee on Statistical Methodology provides a checklist to assist in the assessment of risk for re-
identification and improper disclosure. For additional information, see the Federal Committee on Statistical Methodology:
Confidentiality and Data Access Committee, Checklist on Disclosure Potential of Data Releases,
http://www.fcsm.gov/committees/cdac/
.
61
The retention of useful properties in anonymized data is dependent upon the statistical disclosure limitation technique
applied.
62
Anonymization is also commonly used by agencies to release datasets to the public for research purposes.
77
G
UIDE TO
P
ROTECTING THE
C
ONFIDENTIALITY OF
P
ERSONALLY
I
DENTIFIABLE
I
NFORMATION
(PII)
4-7
controls may not be in the recommended set of security controls for the baselines identified in NIST SP
800-53 (e.g., a control might only be recommended for high-impact systems). However, organizations
may choose to provide greater protections than what is recommended; see Section 3.2 for a discussion of
factors to consider when choosing the appropriate controls. In addition to the controls listed below, NIST
SP 800-53 contains many other controls that can be used to help protect PII, such as incident response
controls.
Access Enforcement (AC-3). Organizations can control access to PII through access control policies
and access enforcement mechanisms (e.g., access control lists). This can be done in many ways. One
example is implementing role-based access control and configuring it so that each user can access
only the pieces of data necessary for the user‘s role. Another example is only permitting
users to
access PII through an application that tightly restricts their access to the PII, instead of permitting
users to directly access the databases or files containing PII.
63
Encrypting stored information is also
an option for implementing access enforcement.
64
OMB M-07-16 specifies that Federal agencies
must ―encrypt, using only NIST certified cryptographic modules, all data on mobile
computers/devices carrying agency data unless the data is determined not to be sensitive, in writing,
by your Deputy Secretary or a senior-
level individual he/she may designate in writing‖.
Separation of Duties (AC-5). Organizations can enforce separation of duties for duties involving
access to PII. For example, the users of de-identified PII data would not also be in roles that permit
them to access the information needed to re-identify the records.
Least Privilege (AC-6). Organizations can enforce the most restrictive set of rights/privileges or
accesses needed by users (or processes acting on behalf of users) for the performance of specified
tasks. Concerning PII, the organization can ensure that users who must access records containing PII
only have access to the minimum amount of PII, along with only those privileges (e.g., read, write,
execute) that are necessary to perform their job duties.
Remote Access (AC-17). Organizations can choose to prohibit or strictly limit remote access to PII.
If remote access is permitted, the organization should ensure that the communications are encrypted.
User-Based Collaboration and Information Sharing (AC-21). Organizations can provide
automated mechanisms to assist users in determining whether access authorizations match access
restrictions, such as contractually-based restrictions, for PII.
Access Control for Mobile Devices (AC-19). Organizations can choose to prohibit or strictly limit
access to PII from portable and mobile devices, such as laptops, cell phones, and personal digital
assistants (PDA), which are generally higher-risk than non-portable devices (e.g., desktop computers
at the organization‘s facilities). Some organizations may choose to restrict remote access involving
higher-
impact instances of PII so that the information will not leave the organization‘s physical
boundaries. If access is permitted, the organization can ensure that the devices are properly secured
and regularly scan the devices to verify their security status (e.g., anti-malware software enabled and
up-to-date, operating system fully patched).
Auditable Events (AU-2). Organizations can monitor events that affect the confidentiality of PII,
such as unauthorized access to PII.
63
For example, suppose that an organization has a database containing thousands of records on employees‘ benefits. Instead
of allowing a user to have full and direct access to the database, which could allow the user to save extracts of the database
records to the user‘s computer, removable media, or other locations, the organization could permit the user to access only
the necessary records and record fields. A user could be restricted to accessing only general demographic information and
not any information related to the employees‘ identities.
64
Additional encryption guidelines and references can be found in FIPS 140-2: Security Requirements for Cryptographic
Modules, http://csrc.nist.gov/publications/PubsFIPS.html
.
Documents you may be interested
Documents you may be interested