(rendered distinguishable) by using a code, algorithm, or pseudonym that is assigned to individual
records. The code, algorithm, or pseudonym should not be derived from other related information
the individual, and the means of re-identification should only be known by authorized parties and not
disclosed to anyone without the authority to re-identify records. A common de-identification technique
for obscuring PII is to use a one-way cryptographic function, also known as a hash function, on the PII.
De-identified information can be assigned a PII confidentiality impact level of low , as long as the
following are both true:
The re-identification algorithm, code, or pseudonym is maintained in a separate system, with
appropriate controls in place to prevent unauthorized access to the re-identification information.
The data elements are not linkable, via public records or other reasonably available external records,
in order to re-identify the data.
For example, de-identification could be accomplished by removing account numbers, names, SSNs, and
any other identifiable information from a set of financial records. By de-identifying the information, a
trend analysis team could perform an unbiased review on those records in the system without
compromising the PII or providing the team with the ability to identify any individual. Another example
is using health care test results in research analysis. All of the identifying PII fields can be removed, and
the patient ID numbers can be obscured using pseudo-random data that is associated with a cross-
reference table located in a separate system. The only means to reconstruct the original (complete) PII
records is through authorized access to the cross-reference table.
Additionally, de-identified information can be aggregated for the purposes of statistical analysis, such as
making comparisons, analyzing trends, or identifying patterns. An example is the aggregation and use of
multiple sets of de-identified data for evaluating several types of education loan programs. The data
describes characteristics of loan holders, such as age, gender, region, and outstanding loan balances. With
this dataset, an analyst could draw statistics showing that 18,000 women in the 30-35 age group have
outstanding loan balances greater than $10,000. Although the original dataset contained distinguishable
identities for each person, the de-identified and aggregated dataset would not contain linked or readily
identifiable data for any individual.
4.2.4 Anonymizing Information
is defined as previously identifiable information that has been de-identified and
for which a code or other association for re-identification no longer exists.
applies de-identification methods, determines the risk is very small, and documents the justification. 45 C.F.R. § 164.514,
This is not intended to exclude the application of cryptographic hash functions to the information.
Hashing may not be appropriate for de-identifying information covered by HIPAA. 45 C.F.R. § 164.514 (c)(1) specifically
excludes de-identification techniques where the code is derived from the PII itself. Organizations should consult their legal
counsel for legal requirements related to de-identification and anonymization.
For additional information about anonymity, see: A. Pfitzmann and M. Hansen, A Terminology for Talking about Privacy by
Data Minimization: Anonymity, Unlinkability, Undetectability, Unobservability, Pseudonymity, and Identity Management,
updated 2009, http://dud.inf.tu-dresden.de/literatur/Anon_Terminology_v0.32.pdf
Based on the Common Rule, which governs confidentiality requirements for research, 15 C.F.R. Part 27. Some
organizations do not distinguish between the terms de-identified and anonymized information and use them interchangeably.
Additionally, the amount of information available publicly and advances in computational technology make full anonymity
of released datasets (e.g., census data and public health data) difficult to accomplish. For additional information, see:
American Statistical Association,
Data Access and Personal Privacy: Appropriate Methods of Disclosure Control
December 6, 2008, http://www.amstat.org/news/statementondataaccess.cfm