Researchers and their teams need to be aware of the policies and processes to which their research data must comply. In instances where sensitive data cannot be made public for various ethical, policy or legal reasons, research teams should consider whether de-identifying data, i.e. removing direct identifiers, is possible and would allow for safe sharing.
Direct identifiers are those which place study participants at immediate risk of being re-identified. The following list is based on various sources, including guidance from major international funding agencies, the US Health Insurance Portability and Accountability Act (HIPAA) and the British Medical Journal.
Direct identifiers include:
For research projects involving human participants and human biological materials, these decisions must align with UVic's Human Research Ethics requirements.
Method | Description |
Anonymization |
Direct and indirect identifiers have been removed or manipulated together with mathematical and technical guarantees to prevent re-identification. Example: Meaningless data is calibrated to a dataset to hide whether an individual is present or not. |
De-identification |
Direct and known indirect identifiers have been removed or manipulated to break the linkage to real world identities. Example: Data are suppressed, generalized, perturbed, or swapped; e.g., GPA: 3.2 = 3.0-3.5, gender: female = gender: male. |
Pseudonymization |
Information from which direct identifiers have been eliminated or transformed, but indirect identifiers remain intact. Example: Unique, artificial pseudonyms replace direct identifiers; e.g., John Doe = 5L7T LX619Z (unique sequence not used anywhere else). |
Researchers may consider use of algorithm-based tools to help anonymize their data and reduce the risk of reidentification. A range of open source software is available.
ARX |
Amnesia |
Anonimatron |
|
Website |
|||
Purpose |
|
|
|
System Requirement |
|
|
|
Notable Features |
|
|
|
Limitations |
|
|
|
Additional Resources
Managing and sharing sensitive data can prove to be a complex undertaking that requires skill and expertise. Consult the following resources to start learning more about how to share sensitive data responsibly.
Sensitive Data Toolkit (Portage)
De-Identification Guidance (Portage)
A Visual Guide to Practical Data De-Identification (Kelsey Finch)
ACRL Primer for Protecting Sensitive Data in Academic Research (Association of College & Research Libraries)
Data Anonymization and De-Identification Guide (University of British Columbia)