The Importance Of Data Anonymicy

1899 Words8 Pages

Companies and organizations collect and use large amount of sensitive user data whose release must be carefully controlled. The phenomenal advance in information technology over the past few decades has literally transformed our lives. Particularly, the explosive growth of the Internet and e-commerce has enabled people to carry out daily activities online, for example, online shopping, e-banking and even consulting a doctor over the Internet. Such prevalent online activities imply that a vast amount of personal data is electronically produced and collected continuously. Such collected data represent an important asset today as they can be used for various purposes ranging from scientific research to demographic trend analysis
…show more content…
Anonymity is an important concept for privacy, and data anonymity is particularly crucial in public databases such as census data or health records collected by government agencies. Data anonymity can also be useful in the private sector, for example, when an organization wishes to allow third parties to access its customer data. In such a case, it cannot be guaranteed that the privacy policy of the data will be always respected by the third parties. Thus, the organization must assure customers’ privacy by removing all information that can link data items with individuals. Anonymization algorithms use suppression (in which certain attribute values are replaced by a '*' or any other special character) and generalization (in which certain attribute values are replaced by broader categories) of records to satisfy privacy requirements with minimal distortion of…show more content…
These attributes are suppressed in the anonymized relation. E.g., name and social security.
• Quasi-identifier (QI): Attributes that can identify an individual based on other information available to an adversary. QI attributes are generalized. E.g., gender, zipcode, birth date.
• Sensitive attribute: Some attributes cause a privacy breach when associated with a unique individual. Such attributes are called sensitive arrtibutes. E.g., disease or salary.

A key difficulty of data anonymization comes from the fact that data utility (i.e., data quality) and data privacy are conflicting goals. Data privacy can be enhanced by hiding more data values, but it inevitably decreases data utility. Revealing more data values increases data utility, but it may decrease data privacy. A recent approach addressing this difficulty relies on the notion k-anonymity. In this approach, the data privacy is guaranteed by ensuring that any record in the released data is indistinguishable from at least (k-1) other records with respect to a set of attributes called quasi-identifier. In other words it can be said that, a data release is said to satisfy k-anonymity if every tuple released cannot be related to fewer than k respondents, where k is a positive integer set by the data holder, possibly as the result of a negotiation with other parties. Homogeneity attack and the background knowledge attack can be used to compromise a k-anonymous
Open Document