Natural Language Processing and Opioid Use Disorder: A Holistic Social Determinants of Health Approach

Maxwell Sharkey and Kylie Degner, “Natural Language Processing and Opioid Use Disorder: A Holistic Social Determinants of Health Approach” 

Mentor: Phoenix Do, Public Health, Public Health (Joseph J. Zilber School of) 

Poster #142 

Social determinants of health (SDoH)—such as social and family support, housing stability, financial resources, and the quality of neighborhood environments—are critical factors influencing a wide range of health outcomes, including opioid use disorder (OUD). However, much of this information remains underutilized in electronic health records (EHRs) due to challenges in extracting unstructured data from clinical notes, creating a significant barrier to translational research. This pilot study, representing the initial steps to develop a streamlined process to efficiently integrate SDoH into clinical decision-making, applies Natural Language Processing (NLP) to extract SDoH data from patients diagnosed with OUD within Wisconsin’s Froedtert Health and Medical College of Wisconsin (FH & MCW) Health System. A random sample of 300 de-identified notes is being manually annotated to create a gold standard evaluation set for NLP model training. An annotation guideline has been developed and is continuously being revised to ensure accurate identification of SDoH categories, including socio-demographic (e.g., education, employment), behavioral and lifestyle (e.g., tobacco use, alcohol consumption), social and family support (e.g., support person, communication with friends and family), and inter-personal safety. To date, approximately 150 clinical notes have been manually annotated, with inter-annotator agreement scores improving as the project progresses. The presence of SDoH documentation varies by provider type, ranging from 72% of notes from psychosocial providers (e.g., social worker) to 14% from nurses. Once annotation is completed, the 300 annotated notes will be used to fine-tune validated NLP models for extracting 8,000 additional unstructured clinical notes. Preliminary findings indicate significant variability in SDoH documentation, highlighting gaps that can inform future improvements in SDoH data quality and usability for research and clinical care.