Improvement of Oncology NLP Research

Oncology Data Precision: Licensing, De-identification, & Annotation for NLP Model Innovation

Oncology nlp

Revolutionizing Cancer Care with Cutting-Edge NLP Technologies

The client, a major player in the healthcare industry, required an advanced NLP solution to process a substantial volume of oncology medical records. As part of a pivotal initiative to refine oncology research, the need to balance detailed data analysis with stringent privacy standards is paramount. This case study outlines our contributions to enhancing the client’s research endeavors through high-fidelity data annotation, rigorous de-identification practices, and the application of Natural Language Processing (NLP) techniques, all within the regulatory framework provided by HIPAA.

Volume

Data Licensing + Data De-id
10 Pages
Oncology Relationships
10 Pages
Non Oncology Domain
10 Pages
Negation
10 Pages
Oncology Domain
10 Pages
NER + Relationship Mapping
10 Pages

Challenges

The project required a nuanced understanding of clinical documentation, precise identification of medical entities, and the ability to apply negation labels accurately, all within a secure framework that protects patient privacy according to HIPAA regulations. The endeavor demanded not only technical expertise in handling large volumes of complex data but also a strategic approach to incorporate feedback and maintain quality across all stages of the annotation process.

Objectives

Detailed Description of Services

CategoryDescription
Comprehensive Clinical Data CoverageSpanning various note types, care settings, and oncological subspecialties, ensuring a robust dataset reflective of diverse clinical scenarios.
Rigorous De-identificationEnsuring all labeled records are de-identified in compliance with HIPAA’s Safe Harbor method, assuring client confidence in data privacy and security.
Annotation GuidelinesCreation and implementation of standard data annotation guidelines for preparing Labeled Records in line with HIPAA standards.
Advanced Annotation StrategiesManual annotation of 10,000 pages of oncology-related records was carried out with a detailed focus on identifying negation statuses and other pertinent information in accordance with established guidelines.
Rigorous Quality AssuranceAttain the specified quality standard outlined in the guideline

Solution

Our approach involved the following key strategies:

Customized Oncology Dataset Compilation

From a vast archive of over 5 MN EHR’s, a carefully chosen subset of data was extracted, aimed at addressing client’s specialized requirements for oncology data with a focus on genomic entities. The collection process involved creating an exhaustive list of tumor markers, genes, variants, & TNM stages, employing keyword searches to pinpoint documents abundant in this data. Regular expressions were used to identify a range of genetic variations and cancer stages. This approach, combined with a broad data coverage encompassing various document types, specialties, care settings, and data from multiple physicians, ensured a comprehensive & relevant oncology dataset.

Oncology dataset compilation

Rigorous De-identification

The process adhered strictly to HIPAA’s Safe Harbor method for de-identification, which guarantees the client’s confidence in data privacy and security. This involves removing all Protected Health Information (PHI) and replacing it with labeled placeholders, thereby maintaining the utility of the data while protecting patient confidentiality.

De-identification Variables

CategorySubcategory
NamePatient name, Physician name, Nurse practitioner name, Family member name, Medical center name, Clinic name, Nursing home name, Company name, University name
Age 
DateDate pattern, Month Year pattern, Day Month pattern, Day Year pattern, Day, Month, Year, Season
LocationCountry, State, City, Street, ZIP Code, Room number, Suite number, Floor number
IDSocial security number, Medical record number, Health plan beneficiary number, Account number, Certificate/License number, Biometric id, Record id, Accession number, Vehicle identification number, License plate numberDevice identifiers and serial number
ContactTelephone number, Fax number, Email address, Web URL, IP address

Example:

On September 25, 2106, at 11:00 am, Mr. Harry Pace, aged 90, was admitted to Forrest General Hospital for a scheduled hip surgery, previously consulted by his primary care physician Dr. Jose Martin, and attended by Kendra Reith, MD. During his stay, he was under the care of Mary Hu, N.P., and Suzan Ray, R.N., with R. Charles Melancon, PA, also being consulted. His operation, conducted on the same day as admission, was successful with no complications reported. Following surgery, Mr. Pace was transferred to Room 202, Floor 2, for recovery. His wife, Emma Pace, was present throughout and was provided with all necessary updates. During his brief stay, his medical records, including MRN MR99062619 and Account KV000014764, were handled according to the standard protocols of Gracewood Nursing Home, his previous residence. He was discharged later the same day to the care of Oakland Outpatient Clinic for further recuperation. Throughout the process, all procedures were documented and secured with adherence to confidentiality standards.

Example: De-identified

On [Date Pattern], at 11:00 am, Mr. [Patient Name], aged [Age], was admitted to [Medical Center Name] for a scheduled hip surgery, previously consulted by his primary care physician Dr. [Physician Name], and attended by [Physician Name] MD. During his stay, he was under the care of [Nurse Practitioner], N.P., and [Nurse Practitioner], R.N., with [Physician Name], P.A., also being consulted. His operation, conducted on the same day as admission, was successful with no complications reported. Following surgery, Mr. [Patient Name] was transferred to Room no. [Room Number], Floor no. [Floor Number], for recovery. His wife, [Family Member Name], was present throughout and was provided with all necessary updates. During his brief stay, his medical records, including MRN [Medical Record Number] and Account [Account Number], were handled according to the standard protocols of [Nursing Home Name], his previous residence. He was discharged later the same day to the care of [Clinic Name] for further recuperation. Throughout the process, all procedures were documented and secured with adherence to confidentiality standards.

Annotation Guidelines & Advanced Annotation Techniques

Shaip was instrumental in establishing and implementation of standard data annotation guidelines ensured that all Labeled Records were prepared consistently and in compliance with HIPAA standards. Moreover 10,000 pages from various medical records were meticulously annotated, with a focus on the detailed labeling of negation statuses and other clinically relevant entities including various oncology subspecialties. The annotation were carried out by a team of expert annotators with specialized knowledge in oncology and data privacy regulations.

Complex Annotation Criteria

CategorySubcategory
Date Annotation (Oncology)Diagnosis Date, Stage Date, Onset, Procedure Date, Med Date Started, Med Date Ended, Radiation Date Started, Radiation Date Ended
Disease (Oncology)Cancer Problem, Histology, Clinical Status, Body Site, Behaviour, Grade, Cancer Stage, TNM stage, Tumour Marker Test, Dimensions, Code
Treatment (Oncology)Cancer Medicine, Drug Dosage, Frequency, Cancer Surgery, Surgery Result, Radiation Modality, Radiation Dosage
GenomicsVariation Code, Gene Studied, Method, Specimen
NegationNegative, Possible Negative, Uncertain, Possible Positive
Clinical NERCancer problem – Body Site, Histology – Body Site, Behaviour – Body Site, Cancer Surgery – Relationships Body Site, Radiation Modality – Body Site, Histology – Grade, Cancer Problem – Dimension

Example:

Oncology clinical note statement

Oncology Clinical Note Statement

“Patient Jane Doe was diagnosed with Stage IIIB non-small cell lung cancer (NSCLC), specifically adenocarcinoma, on 03/05/2023. The cancer is located in the right lower lobe of the lung. It is classified as T3N2M0 according to the TNM staging system, with a tumor size of 5 cm x 3 cm. An EGFR exon 19 deletion was identified through PCR analysis of the tumor biopsy specimen. Chemotherapy with Carboplatin AUC 5 and Pemetrexed 500 mg/m² was initiated on 03/20/2023 and is to be administered every 3 weeks. External beam radiation therapy (EBRT) at a dose of 60 Gy in 30 fractions commenced on 04/01/2023. The patient’s treatment is ongoing, and there is no evidence of brain metastases on the recent MRI. The possibility of lymphovascular invasion is yet to be determined, and the patient’s tolerance for the full chemotherapy regimen remains uncertain.

Oncology Clinical Note Statement

Oncology clinical note statement

Rigorous Quality Assurance

Implemented a flexible project management framework that facilitated the effective integration of client feedback while upholding stringent quality standards. A comprehensive quality assurance protocol was enforced, aligning with the guidelines to reach the requisite quality benchmarks. This protocol featured successive rounds of review and verification, securing the precision and dependability of the annotated data. Such meticulous quality oversight is crucial in crafting a dependable NLP solution, vital for informed clinical decision-making and research excellence.

Outcome

Successfully delivered 10,000 high-quality, De-identified Labeled Records, providing a secure and valuable dataset for the client’s NLP model development. The meticulous application of NLP and adherence to HIPAA de-identification standards resulted in a highly refined dataset that will underpin the client’s ongoing and future oncology research efforts, ultimately aiming to enhance oncology patient outcomes and care delivery efficiency.

The success of the project illustrates our ability to handle complex medical data with precision, contributing to the client’s aim of improving patient care outcomes & accelerating the pace of healthcare innovation.

Our partnership with Shaip has been instrumental in advancing our NLP capabilities within the oncology domain. The professional handling of 10,000 medical records, annotated with detailed negation and other clinical entities, demonstrated their commitment to excellence and compliance. Moreover, their commitment to privacy standards like HIPAA have provided us with invaluable resources to drive our AI initiatives of developing a cutting-edge oncological treatments and diagnostics forward.

Golden-5-star

Accelerate your Healthcare AI
application development by 100%