Introduction
At a time when data protection is becoming a legal and technological priority, more and more institutions are turning to anonymization as a method of securing information. Regulations such as GDPR require effective data security, and anonymization seems to be an ideal solution. In theory, anonymized data makes it impossible to identify an individual. In practice, however, it turns out that just a little bit of additional information is enough to reveal someone’s identity….
What does deanonymization consist of?
De-anonymization is the process of re-associating data with a specific person, despite its previous “cleansing” of direct identifiers. The most common way to do this is to use so-called quasi-identifiers, that is, information that does not allow identification on its own, but when combined with other data sources becomes sufficient. A classic example is date of birth, zip code and gender. As Latanya Sweeney showed as early as 2000, this combination makes it possible to identify about 87% of the US population.
De-anonymization techniques are based on, among other things, data correlation, analysis of behavioral patterns. Importantly, artificial intelligence tools are increasingly being used for this, automating and speeding up the process.
Loud cases of de-anonymization
MIT and Harvard
One of the most famous cases is a study published in 2013 by researchers from MIT and Harvard, who showed that anonymized genetic data of men could be linked to names by analyzing a genealogy database. Identification was possible despite the absence of a first name, last name or PESEL number.
Netflix
Another example is Netflix’s 2006 release of data on users’ movie ratings. Despite the removal of names, researchers were able to link profiles to accounts on IMDb, which made it possible to identify specific individuals and their preferences.
Medicare
In Australia, Medicare data deemed anonymized was published in 2016. However, researchers have shown that using publicly available information, it can be easily re-identified, revealing citizens’ medical data.
Why simple anonymization is not enough
Many organizations think that hiding names or blurring a section of text is enough to protect data. Meanwhile, this kind of “visual” anonymization does not protect the layer of data hidden in the document structure. In PDF files, data can still be in the text layer, even if it has been obscured by a colored rectangle.
True anonymization requires transforming data in an irreversible way and, moreover, testing its effectiveness under real-world attacks. The possibility of merging data sets and using AI techniques to reconstruct missing information should be taken into account.
Recommendations for organizations
To effectively protect personal data, organizations should:
- use advanced anonymization tools, such as Bluur.ai,
- eliminate quasi-identifiers or transform them into aggregated forms,
- test susceptibility to deanonymization,
- Train legal and IT teams in privacy protection techniques,
- Document the anonymization process in the context of GDPR audits.
Summary
Anonymization is an effective data protection tool only if it is done with due diligence and an understanding of the technical risks. History shows that even seemingly good-looking data can reveal more than we assume. Therefore, organizations should treat anonymization as a strategic process, requiring knowledge, tools and accountability. Only in this way can the real risks of de-anonymization be avoided.