Previous works have evaluated the quality of LLMs responses to biomedical and clinical knowledge questions. However, the ability of Large Language Models (LLMs) to improve efficiency and reduce cognitive burden has not been established, and the effect of LLMs on clinical decision-making is unknown. To bridge this knowledge gap, Shan Chen, Marco Guevara, Shalini Moningi, et al. conducted a proof-of-concept end-user study assessing the effect and safety of LLM-assisted patient messaging. This study serves as a call to action for a measured approach to implementing LLMs within EHRs, including evaluations that reflect how they will be used in clinical settings and considerations of human factors. The authors showed that that existing evaluations are insufficient to understand clinical utility and risks because LLMs might unexpectedly alter clinical decision making, and that physicians might use LLMs’ assessments instead of using LLM responses to facilitate the communication of their own assessments.
Chen, S., Guevara, M., Moningi, S., Hoebers, F., Elhalawani, H., Kann, B. H., Chipidza, F. E., Leeman, J., Aerts, H. J. W. L., Miller, T., Savova, G. K., Gallifant, J., Celi, L. A., Mak, R. H., Lustberg, M., Afshar, M., & Bitterman, D. S. (2024). The effect of using a large language model to respond to patient messages. The Lancet Digital Health. https://doi.org/10.1016/s2589-7500(24)00060-8
Background
Administrative tasks, including electronic health records (EHRs), have increased clinicians’ workload, leading to burnout.
To address this, large language models (LLMs) are being used to streamline clinical and administrative tasks.
For example, Epic, the EHR giant, uses OpenAI’s ChatGPT models, including GPT-4, for electronic messaging via online portals.
Study Objective
This study evaluated the effect and safety of LLM-assisted patient messaging on subjective efficiency, clinical recommendations, and potential harms.
Methodology
The study was conducted in two stages in 2023 at Brigham and Women’s Hospital, Boston, MA, USA
In stage 1, six board-certified attending radiation oncologists responded to patient messages as usual in clinical practice.
In stage 2, the authors edited LLM-generated responses to make them clinically acceptable.
The effect of LLM assistance on patient messaging was evaluated through surveys and content analysis of responses.
Results
The study found that LLM drafts were generally acceptable and posed minimal risk of harm.
However, a minority of LLM drafts, if left unedited, could lead to severe harm or death.
The assessing physicians reported that the LLM draft improved subjective efficiency in 76.9% of cases.
The content of physician responses changed when using LLM assistance, suggesting an automation bias and anchoring, which could have a downstream effect on patient outcomes.
Conclusion
The study highlights the potential benefits and risks of using LLMs in clinical settings.
While LLMs may reduce physician workload and improve consistency across physician responses, they may also alter clinical decision-making and pose risks to patient safety.
Therefore, it is essential to thoroughly evaluate LLMs in their intended clinical contexts and exercise caution when implementing these advanced technologies in healthcare.