Judicial guidelines on anonymization
Polish courts use a set of detailed rules that define what the data anonymization process should look like. The documentation analyzed in this article is based on guidelines from 13 courts, including the Court of Appeals in Lublin, the District Court in Warsaw and the District Court in Gliwice. The analysis showed that the guidelines are very similar and based on common examples.
What is data anonymization?
Data anonymization involves converting information that identifies individuals, companies or places into forms that make them unrecognizable. The process includes:
1. converting data to initials: e.g., “Jan Kowalski” → “J. K.”.
2. Using initials with an ellipsis: e.g. “Warsaw” → “W.”.
3. Replacing data with an ellipsis: e.g. “Ulica Krakowska” → “ul. (…)”.
Anonymization is applied to both individuals and other categories of data, such as identification numbers, addresses or geographic names.
Categories of data subject to anonymization
1. individuals
Names are converted to initials, e.g. “Anna Nowak” → “A. N.” In situations where multiple individuals with the same initials appear in a document, numbering is used, e.g. “J. K. (1)” and “J. K. (2)”.
Exceptions:
– Data of judges, protocol officers and prosecutors remain public.
– Authors of cited books and scientific articles are also not anonymized.
2. Miejscowości
Nazwy miejscowości zamieniane są na inicjały, np. „Kraków” → „K.”. W przypadku nazw dwuczłonowych uwzględnia się tylko pierwszy człon, np. „Kąty Wrocławskie” → „K.”.
Exceptions:
– The names of cities that are the seats of courts, e.g., “Court of Appeals in Wroclaw,” remain unchanged.
– Cities mentioned in the place of publication of books are not anonymized.
3. Firmy, organizacje i instytucje
Nazwy firm i organizacji zamieniane są na wielokropki, np. „XYZ sp. z o.o.” → „(…) sp. z o.o.”. Wyjątek stanowią instytucje publiczne, takie jak „Skarb Państwa” lub „Naczelny Sąd Administracyjny”, które nie podlegają anonimizacji.
4 Identification Numbers
Identification numbers, such as PESEL, TIN, KRS, or vehicle registration numbers, are converted into an ellipsis. Automatic detection of more complicated identifiers, such as plot or license numbers, is a problem.
5. Addresses
Addresses are replaced with polys, leaving only basic information such as “street,” “square” or “square.”
6. Geographic names
Geographic adjectives, e.g., “Lower Silesian,” are replaced with an ellipsis. Noun geographic names are replaced with initials.
Categories of non-anonymized data
Not all data is subject to anonymization. Exceptions include:
1. legal terminology: Laws, interpretations or court arguments remain public.
2 Names of public institutions: Courts, ministries or international organizations such as UNESCO are not anonymized.
3 Timestamps: Years, months and days remain public unless they refer to an individual’s date of birth.
4 The content of the rulings listed in the table below does not identify the subject or threaten his legal interests. Interested parties can thus better understand the circumstances of the case and how the courts apply the law.
Examples of non-anonymized data
Phrase type | Example phrases |
Names of parts of the institution (departments, divisions, chairs, establishments, etc.). | Department of Mathematics and Computer Science Medical Intensive Care Unit |
Ordinal numbers, numbers of certain events, persons, objects | dealt 23 knife blows, had 2 sons and 3 daughters |
Performed (or learned) profession, official position and functions held | in the company performed the duties of the Chief Accountant first worked as a veterinarian and then as an IT specialist |
Diseases and types of therapy | Refused to come to work due to upper respiratory tract inflammation, Forced to perform otolaryngologic surgery to open the anterior wall of the trachea |
Names of countries | He was staying in Germany until May , a property apartment in France with an area of |
Numbers with units | purchased corn in the amount of 5,000 tons, usable area of 80 sq. m. |
The role of humans in the process of anonymization
Despite technological advances, humans play a key role in anonymizing documents:
1. Decision to publish: A court employee decides whether a document can be released after anonymization.
2. Error correction: Critical errors need to be corrected, such as revealing the identity of individuals.
3. Approval: The final decision on publication rests with the man.
Options for anonymization in court documents
In the process of anonymizing court documents, it is crucial to take into account the fact that some phrases can be anonymized in more than one way. Anonymization algorithms using natural language processing often offer a variety of possible data substitution options, and the decision to choose one of them is important for the readability and security of the text.
Examples of anonymization options
- Phrases related to organizations and institutions:
Rainbow Sports Club | Law Office of Janina Nowak |
The club (…) | The law firm (…) |
Sports Club (…) | The Law Firm (…) |
Law Office of J. N. |
- Multi-word phrases containing company names or locations
Olesnica Railway Rolling Stock Repair Plant S.A. |
Zakłady (…) S.A. |
Zakłady (…) w O. S.A. |
In such cases, court staff usually leave the choice of algorithm to the court, as long as the option used does not compromise the security or readability of the text.
Differences in data classification
The process of anonymization is not limited to just replacing phrases with initials or periods. Often there is the problem of misclassification of data as a result of the algorithm. A given object (e.g., a name, a company name, a city) can be classified in different ways, resulting in different variants of anonymization.
Examples:
Mercedes Benz | Arthur Andersen | Kazimierz Dolny |
As a person: “M. B.” | As a person: “A. A.” | As a person: “K. D.” |
As the make of the car: “M. (…)” | As a company: “(…)” | As a city: “K.” |
Anonymization errors
Although NLP algorithms are becoming more advanced, they can still make mistakes due to:
Incorrect classification of phrases:
- The algorithm may misclassify the phrase, leading to incorrect anonymization. For example:
- “Kazimierz Dolny” as a person (“K. D.”) instead of as a city (“K.”).
Suboptimal choice of anonymization option:
- The choice of the form of anonymization may affect the readability or security of the text.
Procedure for dealing with errors
In situations where the algorithm makes a non-ideal choice, employees are required to accept the anonymization result as long as it meets two key criteria:
- It does not increase the risk of identifying individuals or institutions.
- It does not lead to a significant decrease in the readability of the text.
In practice, this means that a court employee must assess whether a given anonymization option is sufficiently secure and readable, and if so, approve it even in the case of minor inaccuracies.
The importance of accountability in the anonymization process
Successful anonymization of court documents requires a responsible approach that combines automation with thoughtful human intervention. Algorithms offer a variety of anonymization options, but it is the judgment of staff that ensures that data remains both secure and understandable. It is critical that decisions in the anonymization process minimize the risk of identification while preserving the legibility of documents so they can be legally shared.