Comparison of Bluur and NASK Anonymization Software

Anonymizing sensitive information in documents is crucial for protecting privacy and ensuring compliance with regulations. Today we are going to compare Bluur with NASK (National Research Institute) anonymization tool,, . In this comparison, we analyze the performance of both tools across approximately 90 different documents, such as ID cards, registration certificates, and residency cards.

Patryk Gabryś
Bluur® Team

Table of contents

In this article:

Overview of the Metrics 

Time: Time taken to process each document in seconds.

Found: Number of sensitive elements correctly identified.

Missed: Number of sensitive elements missed by the software.

Incorrect: Incorrect detections where non-sensitive data was flagged.

Summary of Findings Across 90 Documents

The dataset includes various types of documents with differing levels of complexity.

Below is a summary of the performance of Bluur and NASK across all documents. 

Accuracy of Found Elements

Bluur generally had a higher detection rate across all document types, especially for more complex documents.

Bluur: On average, successfully detected 85-100% of sensitive data per document on average, showing strong performance across simple and complex documents.

NASK: Lagged behind, detecting 40-60% of sensitive data per document on average, often missing crucial data in more complex forms.

Missed elements 

Bluur: Missed 0% to 10% of elements across all documents, keeping the risk of exposing sensitive data relatively low.

NASK: Missed 10% to 45% of elements with a notably higher miss rate in complex documents.

Incorrect Detections 

Incorrect detections are the false positives, or the percentage of non-sensitive elements incorrectly flagged.

Bluur: Incorrect detections ranged from 0% to 5%.

NASK: Incorrect detections ranged from 5% to 20%.

Processing Time 

Time measured from uploading the file to getting a document classified.

Bluur: Average data classification time was 1 second, with few more complex documents taking up to 2 seconds.

NASK: Average data classification time was 2 seconds, with few more complex documents taking up 5 to 10 seconds of waiting.

Average statistics for document groups Bluur

Found ElementsMissed elementsIncorrect DetectionsProcessing Time
Personal document91%9%3%1 second
Invoice97%3%4%1 second
Statements93%7%2%1 second
Correspondence92%8%7%1 second

Average statistics for document groups NASK

Found ElementsMissed elementsIncorrect DetectionsProcessing Time
Personal document62%38%16%2 seconds
Invoice32%68%12%2 seconds
Statements10%90%4%3 seconds
Correspondence48%52%10%3 seconds

Detailed Findings Across Various Document Types

Polish ID Card

Bluur classification
NASK classification

Bluur classified 10 areas as sensitive data. Those include personal data, expiry dates and a signature.

Nask classified 7 areas with nationatily mistakenly classified and both signature and a smaller holder’s photo not found.

Both tools classified the document in approximately 1 second.

Profit and loss statement

Bluur classification
NASK classification

Bluur classified 70 areas as sensitive data. Those include personal data, monetary amounts and company details.

Nask classified 18 areas with address and company data as well as names and signatures as a single block at the bottom of the document. Areas classified incorrectly include word “sporządziła” (“prepared by”) as a person

Bluur classified the document in approximately 1 second and NASK did it in 3 seconds.

Certificate of Election of the Mayor

Bluur classification
NASK classification

Bluur classified 16 areas as sensitive data. Those include personal data, official seal and written signatues.

Nask classified 5 areas, classifying only dates and one name. It missed city mentioned in the text several times as well as written signatures with a seal.

Bluur classified the document in approximately 1 second and NASK did it in 2 seconds.

Comparing the Performance of Bluur and NASK

Bluur classification outshines NASK in both simple and complex documents. The main highlight is Bluur’s ability to detect hand writing and table contents, while NASK struggles with very basic documents that contain mechanical font. Bluur’s classification times give an even bigger advantage over NASK while dealing with data of greater sizes.

Patryk Gabryś
Bluur® Team

Knowledge

Keep Reading: Explore More Articles!

Are you looking for more detailed information and deeper insights? Our blog is filled with comprehensive articles that go beyond the surface.

Latest Articles

Articles
Patryk Gabryś
Anonymization of an italian ID

In this article, we will look at an example of an anonymization of an italian ID, as well as the data classification process by the Bluur artificial intelligence model.

Read More

Document redaction with Bluur

Embrace the power of AI-driven precision and streamline your document handling process today.