GDPR Compliance: Anonymization vs. Pseudonymization

GDPR Anonymization vs Pseudonymization
Photo by Glenn Carstens-Peters on Unsplash

When the GDPR goes into effect on May 25, 2018, organizations around the world will forever change the way they think about securing personal data.

Article 32 of the GDPR states that “the controller and the processor shall implement appropriate technical and organisational measures to ensure a level of security appropriate to the risk,” including encryption of personal data. As firms ponder how to comply with this requirement, most conversations revolve around two common practices: pseudonymization and anonymization.

The text of the GDPR offers little concrete advice on how to achieve the “appropriate measures” referenced in Article 32. Fortunately, in 2014 the EU’s Article 29 Data Protection Working Party (WP29) released a detailed opinion on anonymization techniques, which sheds some light on the issue for firms preparing for the new regulation.

GDPR and Pseudonymization

Pseudonymization involves replacing the data in personally identifying fields with pseudonyms, such as random numbers or symbols. For example, a pseudonymized customer record might show a U.S. telephone number as ###-###-#### or a replace a Social Security Number (SSN) with just its last 4 digits.

However, simply replacing the data in these fields does not make it impossible to (re)identify individuals in a pseudonymized data set. Even removing certain identifying fields — such as name, phone number, and date of birth — may not prevent unauthorized users from hacking the pseudonymization key or combining non-pseudonymized information to piece together an individual’s personal data.

Because of the risk of re-identification, the WP29 opinion states that “pseudonymisation is not a method of anonymisation. It merely reduces the linkability of a dataset with the original identity of a data subject.”

In fact, Recital 26 of GDPR explicitly states that “The principles of data protection should apply to any information concerning an identified or identifiable natural person. Personal data which have undergone pseudonymisation [... ] should be considered to be information on an identifiable natural person.”

GDPR and Anonymization

In contrast with pseudonymization, anonymization involves modifying data sets so that no personally identifiable information remains, making it impossible to identify individuals.

Recital 26 of the GDPR states that data that has been truly anonymized lies outside the scope of the regulation:

“The principles of data protection should therefore not apply to anonymous information, namely information which does not relate to an identified or identifiable natural person or to personal data rendered anonymous in such a manner that the data subject is not or no longer identifiable. This Regulation does not therefore concern the processing of such anonymous information, including for statistical or research purposes.”

For businesses, this means that anonymization not only offers a more powerful means of securing personal data, but also enables the use of data for, for example, marketing or analysis purposes without violating individuals’ data privacy.

As the WP29 opinion acknowledges, true anonymization is difficult to achieve without rendering the data useless. For this reason, most companies opt for the weaker pseudonymization approach, which typically leaves the pseudonymized data at least somewhat useful and offers some security benefits, but still leaves the data within the scope of the GDPR.

Static vs Dynamic Anonymization

Analysts have traditionally approached anonymization statically. With static anonymization, the analyst must decide ahead of time which fields contain sensitive data. Then he or she must either remove or alter these fields before running the analysis, reducing the quality of the data set. To complicate matters further, the analyst must also consider any additional knowledge a potential hacker might have that could lead to re-identification of the sensitive fields.

Advanced tools now are making it possible to anonymize data on a query-by-query basis — also known as dynamic anonymization — without destroying the data set’s utility. Aircloak’s database querying tool Diffix, for example, allows analysts to tailor anonymization to the specifics of the query and of the data being requested. Because the program can differentiate which data is considered sensitive under which circumstances, it can deliver an answer set that is fully anonymized yet still useful. (Note: CNIL, the national Data Protection Authority of France, has affirmed that Diffix delivers GDPR-level data anonymity.)

As companies seek solutions for protecting personal data as part of their GDPR readiness plan, they must recognize that, whichever route they choose, it will not be a once-and-done task. The WP29 opinion concludes, “anonymisation and re-identification are active fields of research and new discoveries are regularly published […] Thus, anonymisation should not be regarded as a one-off exercise and the attending risks should be reassessed regularly by data controllers.” This is the reason Aircloak has an active R&D cooperation with the German Max Planck Institute to identify new attacks and implement defenses against such attacks. In addition, the company runs an active bug bounty program — the Aircloak Challenge — where academics, data scientist privacy engineers, and white-hat hackers are rewarded for the new attacks they find, allowing the company to continuously improve its anonymization methods.

With one week to go before the GDPR goes into effect, no one is entirely sure what enforcement will look like, so organizations must use sound judgment in determining how to provide “appropriate measures” for protecting personal data. By choosing their solutions strategically, they can prepare for GDPR while also continuing to derive business value from their data.

If you have questions about GDPR readiness, just give us a call.

Kevin Moos, May 2018