Skip to content

A-725-K/kb-Anonymity-Data-Protection-and-Privacy

Repository files navigation

kb-Anonymity-Data-Protection-and-Privacy

Final period project of the course Data Protection & Privacy: an implementation of the KB-anonymization technique, a framework useful for anonymizing data for testing purpose.

Getting started

To run the project is sufficient to clone or download this repository, with the command:

git clone https://github.com/A-725-K/kb-Anonimity-Data-Protection-and-Privacy.git

Our project relies on Z3 solver, if you don't have it installed, please refer to their main page.

How to launch the program

You have only to run this simple command from your terminal:

python3 main.py [-h] -i INPUT_FILE -o OUTPUT_FILE -a ALGORITHM -k K -c CONFIG_FILE

where:

  • -i: choose a dataset in json format as input
  • -o: choose an output file, it will be in json format
  • -a: choose the technique to apply to enhance the anonymization of data
    • P-F: same Path, no Field repeat
    • P-T: same Path, no Tuple repeat
  • -k: the degree of anonymization you would apply on data
  • -c: a configuration file that contains the range constraints to apply over the fields of tuples in dataset

Otherwise you can simply launch the test_runner utility:

cd utilities
./test_runner

How the repo works

  • datasets: it contains all the data used in our experiments, and a bash script to gather them through an open API
  • kb_anonymity: the core of the program, it contains the library proposed by us
  • mappings: each file contains a map that represents some values transformed in integer
  • main.py: the entry point of the program, the users would like to modify it depending on their needs
  • p_test.py: the SUT, the user have to encode its program like this
  • stat: contains graphics of the results produced by the test runner
  • utilities
    • configs.txt: an example of configuration file, it must follow a specific syntax
    • json_reader.py: a utility to parse the dataset, the user should modify it depending on their data
    • draw_graphics.py: a script that plot the results of the algorithms executed in batch
    • test_runner.sh: a simple script to perform some experiments with different parameters to understand the behavior of the algorithm

1. p_test format
p_test must contains a function called P_Test which simulates the behaviour of the system we want to test. It takesa raw tuple and a list of constraints as input(initially empty). A constraint is a triple (field, operation symbol, value).

2. configs format
In this file the user specify the range constraints for each field of a tuple. The first row must contain all the fields present in the dataset as strings. Then each row must follow this syntax: if the constraints are related to a single field:

field:(([op_symbol value]+),?)+

otherwise, if the constraints involve two related fields:

#field1 op_symbol field2

The comma symbol separates the conditions to be put in OR, while the whitespaces are for conditions in AND.

Authors

  • Andrea Canepa - Computer Science, UNIGE - Data Protection and Privacy a.y. 2019/2020
  • Alessio Ravera - Computer Science, UNIGE - Data Protection and Privacy a.y. 2019/2020

About

Implementation of kb-Anonymity to anonymize data for testing

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published