Computer Science – Databases
Scientific paper
2006-12-21
Computer Science
Databases
Scientific paper
We consider the privacy problem in data publishing: given a relation I containing sensitive information 'anonymize' it to obtain a view V such that, on one hand attackers cannot learn any sensitive information from V, and on the other hand legitimate users can use V to compute useful statistics on I. These are conflicting goals. We use a definition of privacy that is derived from existing ones in the literature, which relates the a priori probability of a given tuple t, Pr(t), with the a posteriori probability, Pr(t | V), and propose a novel and quite practical definition for utility. Our main result is the following. Denoting n the size of I and m the size of the domain from which I was drawn (i.e. n < m) then: when the a priori probability is Pr(t) = Omega(n/sqrt(m)) for some t, there exists no useful anonymization algorithm, while when Pr(t) = O(n/m) for all tuples t, then we give a concrete anonymization algorithm that is both private and useful. Our algorithm is quite different from the k-anonymization algorithm studied intensively in the literature, and is based on random deletions and insertions to I.
Hong Sungho
Rastogi Vibhor
Suciu Dan
No associations
LandOfFree
The Boundary Between Privacy and Utility in Data Anonymization does not yet have a rating. At this time, there are no reviews or comments for this scientific paper.
If you have personal experience with The Boundary Between Privacy and Utility in Data Anonymization, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and The Boundary Between Privacy and Utility in Data Anonymization will most certainly appreciate the feedback.
Profile ID: LFWR-SCP-O-21816