Widely used AI tool for early sepsis detection may be cribbing doctors’ suspicions
When using only data collected before patients with sepsis received treatments or medical tests, the model’s accuracy was no better than a coin toss
Proprietary artificial intelligence software designed to be an early warning system for sepsis can’t differentiate high and low risk patients before they receive treatments, according to a new study from the University of Michigan.
The tool, named the Epic Sepsis Model, is part of Epic’s electronic medical record software, which serves 54% of patients in the United States and 2.5% of patients internationally, according to a statement from the company’s CEO reported by the Wisconsin State Journal. It automatically generates sepsis risk estimates in the records of hospitalized patients every 20 minutes, which clinicians hope can allow them to detect when a patient might get sepsis before things go bad.
“Sepsis has all these vague symptoms, so when a patient shows up with an infection, it can be really hard to know who can be sent home with some antibiotics and who might need to stay in the intensive care unit. We still miss a lot of patients with sepsis,” said Tom Valley, associate professor in pulmonary and critical care medicine, ICU clinician and co-author of the study published recently in the New England Journal of Medicine AI.
Sepsis is responsible for a third of all hospital deaths in the U.S., and early treatment is key to patient survival. The hope is that AI predictions could be instrumental in making that happen, but at present, they don’t seem to be getting more out of patient data than clinicians are.
“We suspect that some of the health data that the Epic Sepsis Model relies on encodes, perhaps unintentionally, clinician suspicion that the patient has sepsis,” said Jenna Wiens, associate professor of computer science and engineering and the corresponding author of the study.
Patients won’t receive blood culture tests and antibiotic treatments until they start presenting sepsis symptoms, for example. While such data could help make an AI very accurately identify sepsis risks, it could also enter the medical records too late to help clinicians get ahead on treatments.
This mismatch in the timing between when information becomes available to the AI and when it’s most relevant to clinicians was evident in the researchers’ evaluation of how the Epic Sepsis Model performed for 77,000 adults hospitalized at University of Michigan Health, the clinical arm of Michigan Medicine.
The AI had already made estimates of each patient’s risk of getting sepsis in the medical center’s standard operations, so the researchers only had to pull the data and perform their analysis. Nearly 5% of the patients had sepsis.
To measure the AI’s performance, the team calculated the probability that the AI assigned higher risk scores to patients who were diagnosed with sepsis, compared to patients who were never diagnosed with sepsis.
When including the predictions made by the AI at all stages of the patient’s hospital stay, the AI could correctly identify a high-risk patient 87% of the time. However, the AI was only correct 62% of the time when using patient data recorded before the patient met criteria for having sepsis. Perhaps most telling, the model only assigned higher risk scores to 53% patients who got sepsis when predictions were restricted to before a blood culture had been ordered.
The findings suggest that the model was cueing in on whether patients received diagnostic tests or treatments when making predictions. At that point, clinicians already suspect that their patients have sepsis, so the AI predictions are unlikely to make a difference.
“We need to consider when in the clinical workflow the model is being evaluated when deciding if it’s helpful to clinicians,” said Donna Tjandra, doctoral student in computer science and engineering and co-author of the study. “Evaluating the model with data collected after the clinician has already suspected sepsis onset can make the model’s performance appear strong, but this does not align with what would aid clinicians in practice.”
This study was supported by Cisco Research, the National Science Foundation and Precision Health at the University of Michigan.