0% found this document useful (0 votes)

19 views

Patient Privacy

The document discusses challenges with finding personal health information in free text clinical narratives and proposes using a BERT model for named entity recognition. It describes the BERT model and how it works, advantages and disadvantages of using BERT, and the process for de-identifying clinical data including classifying variables and techniques for masking direct identifiers.

Uploaded by

Marian-Bernice Haligah

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

19 views

Patient Privacy

Uploaded by

Marian-Bernice Haligah

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 18

MH12007

Protecting
Patient Privacy
Data Sciences.

Group 4
Marian, Mohammed, Linda, Marie, Michelle,
Mahita, and Mannie
Table of Contents

01. 02.
Challenges with free-
Proposed solution:
text narratives in
Entity Recognition Model
EMRs/EHRs

03. 04.
Data deidentification BERT Implementation
for research Strategy
Challenges
Why finding embedded personal information in free-text narratives could be challenging?

01.
Variability in Language &
02. 03. 04.
Diverse Data Sources Manual Labelling Unrepresentative Data
Contextual Ambiguity
Different healthcare professionals Healthcare institutions may use Labelled data requires manual Models trained on unrepresentative
may document patient information various electronic health record annotation of raw text by subject data may miss important personal
using diverse terminology, (EHR) systems, each with its own matter experts train models. Risk of health data from diverse
expressions, or writing styles data format and structure. bias and human error. populations and clinical settings
Challenges (Cont...)
Why finding embedded personal information in free-text narratives could be challenging?

05. 06. 07. 08.

Noise & Incomplete data Patient Variability Handling Negations Patient Consent & Preferences

Personal Information could be Personal Health Information such as Medical narratives often include Respecting and managing varied
fragmented, abbreviated, or implied, their medical conditions and history statements that indicate the patient consent and preferences for
making it challenging for an can vary widely between patients absence of a condition. the use of personal information pose
algorithm to identify and extract complex challenges
the complete set of personal data.
BERT
A Named Entity Recognition (NER) Model

BERT: Bidirectional Encoder

Representations from Transformers

A deep transformer NER model for

text recognition developed by Google
Why BERT?
BERT was the most popular NER
model applied to EHRs since
2021
December 2022: the most
advanced models for clinical
entity recognition are those
based on BERT

From:
Durango, M.C. et al (Oct, 2023) Named Entity Recognition in Electronic
Health Records: A Methodological Review. Healthcare Information Research.
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10651400/
HowdoesBERTfindidentifiers?

Token Classification
Ex. Barack Obama (token) is a
person (class)
Ex. extracting breast cancer
phenotypes
Pre-trained and then fine-tuned with
clinical text data (ex. EHR data)
Examples of BERT applied to EHR
data: BioBERT, BioClinicalBERT
Advantages and Disadvantages of BERT
Advantages Disadvantages
Pre-trained, can be fine-tuned for clinical text Black box
Fine-tuning is inexpensive
Never implemented in real world
Open source: https://github.com/google-
clinical setting
research/bert
The first unsupervised, deeply bidirectional
system for trained NLP
Unsupervised - trained on large amounts
of publicly available text data
Deeply bidirectional - able to learn the
context of words based on all of its
surroundings
Process for De-identifying Data
De-Identification Guidelines for Structured
Data from the Information and Privacy
Commission of Ontario:
1. determine the release model
2. classify variables
3. determine an acceptable re-identification
risk threshold
4. measure the data risk
5. measure the context risk
6. calculate the overall risk
7. de-identify the data
8. assess data utility
9. document the process
#2 - Classifying Variables
Direct identifiers: variables that can identify
an individual either as a stand-alone or in
combination with other readily available
information
Indirect identifiers/quasi-identifiers:
variables wherein adversaries have assumed
background knowledge of an individual, and
thus the information can be used individually
or in combination to re-identify an individual

https://www.ipc.on.ca/wp-content/uploads/2016/08/Deidentification-Guidelines-for-Structured-Data.pdf
#7 - Deidentifying the Data: Masking Direct Identifiers
Masking: The process of removing a variable or
replacing it with pseudonymous or encrypted
information

Redaction: permanent, selective removal of

information
Tokenization (pseudonym): replacing
identifiers with tokens (surrogate values)
Encryption: encoding information so that it is
only decipherable by authorized individuals

https://www.ipc.on.ca/wp-content/uploads/2016/08/Deidentification-Guidelines-for-Structured-Data.pdf
Coding Identifiers Using Cryptography
Algorithms
One-way hash function: Convert data into a
fixed-length string of characters using a hash
function. This can be used for de-identifying
passwords.
Alphabet shift ciphers: Replacing each letter
in the message by a letter that is some fixed
number of positions further along in the
alphabet.
Solution Implementation
To implement this solution, assembling a skilled project team with diverse skills and
utilizing specific tools and technologies are essential.
NLP Experts Machine Learning Data Scientists
Engineer

Stakeholders
Project Manager & Healthcare

Skills
Domain Experts

Privacy &
Ethics Committee Linguists
Compliance
Experts
NLP Libraries BERT Libraries Data Processing
Tools

Privacy & Tech&Analytic Deployment Tools

Tools
Compliance
Technologies

Cloud Storage Database

Validation and
Management
Evaluation Tools Services
System

(Lee et al., 2022; Maiti, 2023; Owuondo, 2023; Tran et al., 2019; Yang et al., 2020)
Additional Built-in
Data
Medical Dictionary/Ontology

Labelled Training Datasets

(Friedlin & McDonald, 2008)

Thank
you very
much!
References
Dentons. (2022, June 24). Considerations for de-identifying personal health information: Guidance from Ontario’s Information and Privacy Commissioner.
https://www.dentons.com/en/insights/articles/2022/june/23/considerations-for-deidentifying-personal-health-
information#:~:text=Under%20PHIPA%2C%20%E2%80%9Cde%2Didentify,in%202020%20to%20change%20this

Durango, M.C. et al (2023, Oct). Named Entity Recognition in Electronic Health Records: A Methodological Review. Healthcare Information Research.
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10651400/

Friedlin, F. J., & McDonald, C. J. (2008). A software tool for removing patient identifying information from clinical documents. Journal of the American Medical Informatics Association : JAMIA, 15(5), 601–610.
https://doi.org/10.1197/jamia.M2702

Google Research. (2020, March 11). BERT. GitHub. https://github.com/google-research/bert

Information and Privacy Commissioner of Ontario. (2016, June). De-identification Guidelines for Structured Data. https://www.ipc.on.ca/wp-content/uploads/2016/08/Deidentification-Guidelines-for-Structured-
Data.pdf

Leaman, R., Khare, R., Lu, Z. (2015). Challenges in clinical natural language processing for automated disorder normalization. Journal of Biomedical Informatics, 57, 28-37. https://doi.org/10.1016/j.jbi.2015.07.010.

Lee, J., Jeong, J., Jung, S., Moon, J., & Rho, S. (2022). Verification of De-Identification Techniques for Personal Information Using Tree-Based Methods with Shapley Values. Journal of personalized medicine, 12(2),
190. https://doi.org/10.3390/jpm12020190

Maiti, S. (2023, April 4). Extracting Medical Information From Clinical Text With NLP. Analytics Vidya. https://www.analyticsvidhya.com/blog/2023/02/extracting-medical-information-from-clinical-text-with-nlp/

Office for Civil Rights. (2023, February 22). Methods for de-identification of phi. HHS.gov. https://www.hhs.gov/hipaa/for-professionals/privacy/special-topics/de-identification/index.html#standard

Owuondo, J. (2023). A Comprehensive Health Electronic Record System with MySQL RDMS, QGIS Database and MongoDB. QGIS Database and MongoDB. http://dx.doi.org/10.2139/ssrn.4548519

Tran, N. H., Nguyen-Ngoc, T. A., Le-Khac, N. A., & Kechadi, M. (2019). A security-aware access model for data-driven ehr system.
https://doi.org/10.48550/arXiv.1908.10229

Turing Enterprises Inc. (2022, June 10). A comprehensive guide to named entity recognition (NER). Hire the World’s Most Deeply Vetted Developers & Teams. https://www.turing.com/kb/a-comprehensive-
guide-to-named-entity-recognition

Yang, X., Bian, J., Hogan, W. R., & Wu, Y. (2020). Clinical concept extraction using transformers. Journal of the American Medical Informatics Association : JAMIA, 27(12), 1935–1942.
https://doi.org/10.1093/jamia/ocaa189

American Inside Out Evolution Pre-Intermediate TB
100% (3)
American Inside Out Evolution Pre-Intermediate TB
12 pages
Beyond Accuracy: Automated De-Identification of Large Real-World Clinical Text Datasets
No ratings yet
Beyond Accuracy: Automated De-Identification of Large Real-World Clinical Text Datasets
13 pages
johnson2020
No ratings yet
johnson2020
8 pages
(2303.11032) DeID-GPT: Zero-Shot Medical Text De-Identification by GPT-4
No ratings yet
(2303.11032) DeID-GPT: Zero-Shot Medical Text De-Identification by GPT-4
53 pages
Principle-Based Approach For The De-Identification of Code-Mixed Electronic Health Records
No ratings yet
Principle-Based Approach For The De-Identification of Code-Mixed Electronic Health Records
11 pages
Data Science and Analytics: Transforming Raw Data into Actionable Insights: A Comprehensive Guide
From Everand
Data Science and Analytics: Transforming Raw Data into Actionable Insights: A Comprehensive Guide
Marlowe Reyes
No ratings yet
ijpds-08-2153
No ratings yet
ijpds-08-2153
12 pages
DE-Identification of Protected Health Information PHI from Free Text in Medical Records
No ratings yet
DE-Identification of Protected Health Information PHI from Free Text in Medical Records
11 pages
a-privacy-preserving-distributed-filtering-framework-for-nlp-30r6g0qti3
No ratings yet
a-privacy-preserving-distributed-filtering-framework-for-nlp-30r6g0qti3
10 pages
Jurnal 5 New
No ratings yet
Jurnal 5 New
33 pages
Dernoncourt Et Al. - 2016 - De-Identification of Patient Notes With Recurrent
No ratings yet
Dernoncourt Et Al. - 2016 - De-Identification of Patient Notes With Recurrent
11 pages
Decoding Large Language Models: An exhaustive guide to understanding, implementing, and optimizing LLMs for NLP applications
From Everand
Decoding Large Language Models: An exhaustive guide to understanding, implementing, and optimizing LLMs for NLP applications
Irena Cronin
No ratings yet
A Certified
No ratings yet
A Certified
9 pages
Synthetic Data Generation: A Beginner’s Guide
From Everand
Synthetic Data Generation: A Beginner’s Guide
Robert Johnson
No ratings yet
Automatic Image Annotation: Enhancing Visual Understanding through Automated Tagging
From Everand
Automatic Image Annotation: Enhancing Visual Understanding through Automated Tagging
Fouad Sabry
No ratings yet
Generating
No ratings yet
Generating
24 pages
Automatic Image Annotation: Fundamentals and Applications
From Everand
Automatic Image Annotation: Fundamentals and Applications
Fouad Sabry
No ratings yet
Future of AI in Biomedicine and Biotechnology - (Chapter 12 Shaping The Future of Healthcare With BERT in Clinical Text... )
No ratings yet
Future of AI in Biomedicine and Biotechnology - (Chapter 12 Shaping The Future of Healthcare With BERT in Clinical Text... )
20 pages
Mastering Vector Databases: The Future of Data Retrieval and AI
From Everand
Mastering Vector Databases: The Future of Data Retrieval and AI
Robert Johnson
No ratings yet
Survey A11
No ratings yet
Survey A11
22 pages
Paper 007
No ratings yet
Paper 007
11 pages
Data Science Unveiled: A Practical Guide to Key Techniques
From Everand
Data Science Unveiled: A Practical Guide to Key Techniques
Ed A Norex
No ratings yet
Introduction to Robotics
From Everand
Introduction to Robotics
Swarnalata Verma
No ratings yet
Data Scientist Roadmap
From Everand
Data Scientist Roadmap
Mohammed Ahmed
5/5 (1)
Image Retrieval: Unlocking the Power of Visual Data
From Everand
Image Retrieval: Unlocking the Power of Visual Data
Fouad Sabry
No ratings yet
ATMECS sayashitm iit madras hackathon
No ratings yet
ATMECS sayashitm iit madras hackathon
14 pages
Med7: A Transferable Clinical Natural Language Processing Model For Electronic Health Records
No ratings yet
Med7: A Transferable Clinical Natural Language Processing Model For Electronic Health Records
17 pages
Real-Time De-Identification of Healthcare Data Using Ephemeral Pseudonyms
No ratings yet
Real-Time De-Identification of Healthcare Data Using Ephemeral Pseudonyms
5 pages
Semantic Translation: Fundamentals and Applications
From Everand
Semantic Translation: Fundamentals and Applications
Fouad Sabry
No ratings yet
Data Mining: Concepts, Fundamentals And Applications
From Everand
Data Mining: Concepts, Fundamentals And Applications
Enrico Guardelli
No ratings yet
Approach For ML
No ratings yet
Approach For ML
4 pages
An Iterative Classification Scheme
No ratings yet
An Iterative Classification Scheme
6 pages
Mining and Classifying Medical Documents
No ratings yet
Mining and Classifying Medical Documents
4 pages
Mastering Data Science: From Basics to Expert Proficiency
From Everand
Mastering Data Science: From Basics to Expert Proficiency
William Smith
No ratings yet
Towards Neural Numeric-To-Text Generation From Temporal Personal Health Data
No ratings yet
Towards Neural Numeric-To-Text Generation From Temporal Personal Health Data
5 pages
CompTIA Data+ (Plus) The Ultimate Exam Prep Study Guide to Pass the Exam
From Everand
CompTIA Data+ (Plus) The Ultimate Exam Prep Study Guide to Pass the Exam
Jamie Murphy
No ratings yet
Med-BERT: Pretrained Contextualized Embeddings On Large-Scale Structured Electronic Health Records For Disease Prediction
No ratings yet
Med-BERT: Pretrained Contextualized Embeddings On Large-Scale Structured Electronic Health Records For Disease Prediction
13 pages
Text Mining: Fundamentals and Applications
From Everand
Text Mining: Fundamentals and Applications
Fouad Sabry
No ratings yet
Machine Learning in Healthcare
From Everand
Machine Learning in Healthcare
Vaibhav Rupapara
No ratings yet
Essentials of Data Analysis
From Everand
Essentials of Data Analysis
Agasti Khatri
No ratings yet
"Big Data Science" Basic Concepts and Applications
From Everand
"Big Data Science" Basic Concepts and Applications
Sukanta Bhattacharya
No ratings yet
Image Retrieval: Fundamentals and Applications
From Everand
Image Retrieval: Fundamentals and Applications
Fouad Sabry
No ratings yet
Concept Mining: Fundamentals and Applications
From Everand
Concept Mining: Fundamentals and Applications
Fouad Sabry
No ratings yet
D1.4_Kattampallil
No ratings yet
D1.4_Kattampallil
26 pages
Data Science: Concepts, Strategies, and Applications
From Everand
Data Science: Concepts, Strategies, and Applications
Zemelak Goraga
No ratings yet
Self-Supervised Learning: Teaching AI with Unlabeled Data
From Everand
Self-Supervised Learning: Teaching AI with Unlabeled Data
Robert Johnson
No ratings yet
Harnessing the Power of AI: A Guide to Making Technology Work for You
From Everand
Harnessing the Power of AI: A Guide to Making Technology Work for You
Roy Hope
No ratings yet
De-Identification Algorithm For Free-Text Nursing Notes
No ratings yet
De-Identification Algorithm For Free-Text Nursing Notes
4 pages
Author Postprint
No ratings yet
Author Postprint
8 pages
Data Science Essentials: Machine Learning and Natural Language Processing
From Everand
Data Science Essentials: Machine Learning and Natural Language Processing
Angel Gabaldon
No ratings yet
Expert Systems - 2023 - Gopalakrishnan - PriMed Private Federated Training and Encrypted Inference On Medical Images in
No ratings yet
Expert Systems - 2023 - Gopalakrishnan - PriMed Private Federated Training and Encrypted Inference On Medical Images in
14 pages
2023 Article 771
No ratings yet
2023 Article 771
10 pages
Cloud-Based Multi-Modal Information Analytics
From Everand
Cloud-Based Multi-Modal Information Analytics
Tanushri Kaniyar
No ratings yet
Clinical_concept_annotation_with_contextual_word_e
No ratings yet
Clinical_concept_annotation_with_contextual_word_e
31 pages
PYTHON DATA ANALYTICS: Mastering Python for Effective Data Analysis and Visualization (2024 Beginner Guide)
From Everand
PYTHON DATA ANALYTICS: Mastering Python for Effective Data Analysis and Visualization (2024 Beginner Guide)
FLOYD BAX
No ratings yet
Artificial Intelligence for Cybersecurity: Develop AI approaches to solve cybersecurity problems in your organization
From Everand
Artificial Intelligence for Cybersecurity: Develop AI approaches to solve cybersecurity problems in your organization
Bojan Kolosnjaji
No ratings yet
"Data Analysis" Basic Concepts and Applications
From Everand
"Data Analysis" Basic Concepts and Applications
Sukanta Bhattacharya
No ratings yet
CDW + Health Care Data and Information Sources
No ratings yet
CDW + Health Care Data and Information Sources
6 pages
Deep 6 Minutes Not Months
No ratings yet
Deep 6 Minutes Not Months
25 pages
Essential Federated Learning: AI at the Edge
From Everand
Essential Federated Learning: AI at the Edge
Robert Johnson
No ratings yet
Preprint-V1 Covered
No ratings yet
Preprint-V1 Covered
18 pages
[FREE PDF sample] Oracle Database 12c the complete ebooks
100% (2)
[FREE PDF sample] Oracle Database 12c the complete ebooks
34 pages
A Secure QR Code System For Sharing Personal Confidential Information
No ratings yet
A Secure QR Code System For Sharing Personal Confidential Information
4 pages
Edu en Nsxticm3 Lab Ie
No ratings yet
Edu en Nsxticm3 Lab Ie
129 pages
ST_MODEL_IX_2024_25_SET_A
No ratings yet
ST_MODEL_IX_2024_25_SET_A
4 pages
Translation of Axes
100% (1)
Translation of Axes
7 pages
Immediate Download Microneurosurgery Yasargil V2 All Chapters
100% (1)
Immediate Download Microneurosurgery Yasargil V2 All Chapters
24 pages
Critical Approaches - Literary Theory PowerPoint - ACAD WRITING
No ratings yet
Critical Approaches - Literary Theory PowerPoint - ACAD WRITING
27 pages
Periods of Phil Lit (Mam Vicky)
No ratings yet
Periods of Phil Lit (Mam Vicky)
88 pages
Pre GUTS 2 Ver 2014
No ratings yet
Pre GUTS 2 Ver 2014
109 pages
B1+.How to write an email to a friend giving news
No ratings yet
B1+.How to write an email to a friend giving news
4 pages
Social Studies - Grade 5
100% (1)
Social Studies - Grade 5
845 pages
NOTES-CLASS-9_. NO MEN ARE FOREIGN
No ratings yet
NOTES-CLASS-9_. NO MEN ARE FOREIGN
3 pages
AZ-303 Exam - Free Actual Q&As, Page 1 ExamTopics
No ratings yet
AZ-303 Exam - Free Actual Q&As, Page 1 ExamTopics
431 pages
Lesson Plan Level Competency
No ratings yet
Lesson Plan Level Competency
6 pages
SAP HANA SP5 Course Content Details
No ratings yet
SAP HANA SP5 Course Content Details
3 pages
Confessions of An English Opium Eater
No ratings yet
Confessions of An English Opium Eater
318 pages
I Vocabulary Review
No ratings yet
I Vocabulary Review
2 pages
Into Thy Word Bible Study in Hebrews: Hebrews 5:11-14: Our Call To Christian Maturity!
No ratings yet
Into Thy Word Bible Study in Hebrews: Hebrews 5:11-14: Our Call To Christian Maturity!
6 pages
1606972821SAT Reading Practice Paper 4
No ratings yet
1606972821SAT Reading Practice Paper 4
5 pages
Kindergarten-Unit 4 and 5 Spiral
No ratings yet
Kindergarten-Unit 4 and 5 Spiral
94 pages
Mauryan Empire
No ratings yet
Mauryan Empire
18 pages
Pivot Tables: Insert A Pivot Table
No ratings yet
Pivot Tables: Insert A Pivot Table
7 pages
Adobe Indesign 101 - Final Portfolio Version
No ratings yet
Adobe Indesign 101 - Final Portfolio Version
8 pages
Theory2021-22-IC5I - IAU Teaching Plan
No ratings yet
Theory2021-22-IC5I - IAU Teaching Plan
2 pages
Information Practice Project Spots Shop Management System
59% (49)
Information Practice Project Spots Shop Management System
51 pages
Christmas Songs
No ratings yet
Christmas Songs
4 pages
Guildford School Foundation Classes
No ratings yet
Guildford School Foundation Classes
4 pages
Grammar: Gerunds and Infinitives
100% (1)
Grammar: Gerunds and Infinitives
2 pages
Final DCCN Practical File
No ratings yet
Final DCCN Practical File
14 pages

Patient Privacy

Uploaded by

Patient Privacy

Uploaded by

MH12007

05. 06. 07. 08.

BERT: Bidirectional Encoder

A deep transformer NER model for

Redaction: permanent, selective removal of

Privacy & Tech&Analytic Deployment Tools

Cloud Storage Database

Labelled Training Datasets

(Friedlin & McDonald, 2008)

Google Research. (2020, March 11). BERT. GitHub. https://github.com/google-research/bert

You might also like