AI
AI Enhances Cognitive Decline Detection in Medical Records: Study
October 16, 2024
In a recent publication in eBioMedicine, researchers assessed the potential of large language models (LLMs) to successfully identify early signs of cognitive decline through electronic health records (EHRs). This study is significant given that millions worldwide are affected by Alzheimer's disease and other forms of dementia. Early detection could lead to more effective treatments and improved patient care.
While LLMs have shown promise in numerous healthcare tasks such as information extraction, entity recognition, and question-answering, their effectiveness for identifying specific clinical conditions like cognitive deterioration using EHRs remains uncertain. Minimal research has compared these innovative models with traditional artificial intelligence methods such as machine learning or deep learning.
The current study sought to determine the efficacy of LLMs for detecting progressive cognitive decay through an analysis of EHR data. It also aimed to compare the performance outcomes between these advanced models with those conventionally trained on domain-specific data.
Researchers from Mass General Brigham analyzed both proprietary and open-source LLMs using medical notes from individuals aged 50 years or older who were diagnosed with mild cognitive impairment (MCI) four years later in 2019. The team excluded cases where the cognitive decline was transient, reversible, or recovering.
To facilitate this process, cloud computing systems compliant with the HIPAA Act enabled prompts for GPT-4 (proprietary) and Llama 2 (open-source). Prompt-augmentation methods like error analysis instructions, retrieval-augmented generation, and hard prompting facilitated development in LLMs.
Baseline study models included XGBoost and attention-based deep neural networks, which encompassed bidirectional long-short-term memory networks. Based on performance metrics, researchers selected the most efficient approach among them all—a three-model ensemble based on majority votes evaluated via confusion matrix scorings.
Results indicate that while GPT-4 outperformed its open-source counterpart, it fell short when matched against conventional models trained specifically within local EHR data domains. However, when these models were combined into an ensemble, their performance improved dramatically, achieving a precision of 90%, recall of 94%, and F1 score of 92%.
GPT-4 demonstrated its ability to highlight dementia therapy options such as Aricept and Donepezil. It also recognized diagnoses like mild neurocognitive disorders, major neurocognitive disorders, and vascular dementia more effectively than previous models. Furthermore, it acknowledged the emotional implications associated with cognitive issues.
While GPT-4 showed promise in handling ambiguous language and complex information without confusing negations or contextual factors, there were instances where it overinterpreted or was overly cautious. Both GPT-4 and attention-based deep neural network models occasionally misread clinical test findings.
In conclusion, combining LLMs with traditional AI methods enhanced diagnostic accuracy for early signs of cognitive decline. Despite this advancement, further research is needed to refine LLMs trained using general domains for improved clinical decision-making. Future studies should aim at merging LLMs with more localized models that leverage medical information along with domain expertise to optimize model performance for specific tasks.
Join Our Newsletter
Popular Articles
Mar 13, 2024
Anyone But You - A Romantic Comedy Surprise of 2023Oct 14, 2024
Oil & Gas Companies Acknowledge Cyber Threats