An early attempt to protect the privacy of research subjects can be found in the Health Insurance Portability and Accountability Act (HIPAA). That regulation states that a data set will be considered anonymized if the directory information about each of the subjects has been removed. Directory information is such data as name, social security number, or address. There are about 40 fields that are considered to be directory information; if you remove those from your data set, it can be shared without violating the privacy regulation.

While removing directory information will keep you within the rules of the regulation, it will not protect the privacy of the individuals in the data set. This was shown in 1997 when Latanya Sweeney re-identified medical records that had been de-identified in accordance with HIPAA’s privacy regulation, including the record of the then-governor of Massachusetts.
Sweeney’s work led to the notion of a quasi-identifier, which is information about you (e.g., gender or birthdate) that cannot alone directly identify you, but can be combined together and the combination found in another data set. A problem occurs when this second data set (e.g., voter registration databases) contains some of your directory information. By linking quasi-identifiers across data sets, an adversary can re-identify a record in a “de-identified” data set and discover whatever personal information (e.g., medical treatments) was meant to be kept anonymous.

There are about 40 fields that are considered to be directory information; if you remove those from your data set, it can be shared without violating the privacy regulation.There are about 40 fields that are considered to be directory information; if you remove those from your data set, it can be shared without violating the privacy regulation.
While removing directory information will keep you within the rules of the regulation, it will not protect the privacy of the individuals in the data set. This was shown in 1997 when Latanya Sweeney re-identified medical records that had been de-identified in accordance with HIPAA’s privacy regulation, including the record of the then-governor of Massachusetts.