A study conducted by physician-investigators at Beth Israel Deaconess Medical Center (BIDMC) has identified a significant comparison between the probabilistic reasoning of an AI chatbot and human doctors and clinicians.
This research not only opens new avenues in the application of artificial intelligence in medicine but also highlights its potential as a clinical decision support tool for physicians.
Adam Rodman, MD, the study’s corresponding author, emphasizes the inherent struggle humans face with probabilistic reasoning. This practice, crucial in making informed decisions, is based on calculating odds, a task at which humans are not naturally adept.
“Humans struggle with probabilistic reasoning, the practice of making decisions based on calculating odds,” said Rodman.
Rodman, an internal medicine physician and investigator in the department of Medicine at BIDMC, highlights the complexity of diagnosis, which involves various cognitive strategies, with probabilistic reasoning being a significant yet challenging component.
“Probabilistic reasoning is one of several components of making a diagnosis, which is an incredibly complex process that uses a variety of different cognitive strategies. We chose to evaluate probabilistic reasoning in isolation because it is a well-known area where humans could use support.”
Acknowledging it as a well-known area where human capabilities can be enhanced, the study aims to explore how artificial intelligence can support this aspect of clinical decision-making.
The research team based their study on a previously conducted national survey. This survey included over 550 practitioners performing probabilistic reasoning across five medical cases.
Chat GPT-4, a publicly available Large Language Model (LLM), was fed the same cases and prompted identically 100 times to generate a diverse set of responses.
Both the chatbot and the human practitioners were tasked with estimating the likelihood of diagnoses based on patient presentations. They were then provided with test results from various medical diagnostics, including chest radiography, mammography, stress tests, and urine cultures, to update their diagnoses.
The findings revealed a nuanced picture of diagnostic accuracy. When test results were positive, the chatbot’s accuracy was mixed.
The AI showed more accuracy in two cases, similarly accuracy in two, and was less accurate in one, compared to human doctors. However, the artificial intelligence chatbot demonstrated superior accuracy over doctors in all five cases when test results were negative.
Rodman points out a critical tendency in human clinicians. “Humans sometimes feel the risk is higher than it is after a negative test result, which can lead to overtreatment, more tests and too many medications,” said Rodman.
The perception of higher risk following a negative test result often caused issues for doctors compared to AI. This aspect underscores the potential of AI in mitigating such biases.
While the comparative performance of chatbots and humans is intriguing, Rodman stresses more on how the availability of such AI technologies could enhance the performance of skilled doctors in clinical settings. Future research, as Rodman suggests, is keenly focused on this aspect.
Rodman clarifies that LLMs, like Chat GPT-4, do not calculate probabilities like epidemiologists or poker players. “LLMs can’t access the outside world — they aren’t calculating probabilities the way that epidemiologists, or even poker players, do. What they’re doing has a lot more in common with how humans make spot probabilistic decisions,” he said.
Their approach mirrors human methods in making quick, spot probabilistic decisions. This similarity is what Rodman finds exciting, as it hints at the potential of AI to integrate seamlessly into clinical workflows, thereby aiding in improved decision-making.
“Even if imperfect, their ease of use and ability to be integrated into clinical workflows could theoretically make humans make better decisions,” he said. “Future research into collective human and artificial intelligence is sorely needed.”
In summary, the study by BIDMC marks a significant step in understanding and leveraging AI in medicine. By highlighting the strengths and limitations of both human doctors and AI in probabilistic reasoning, it paves the way for a future where AI could play a crucial role in supporting and enhancing clinical decision-making, ultimately leading to better patient outcomes.
The full study was published in the journal JAMA Network Open.
—–
Like what you read? Subscribe to our newsletter for engaging articles, exclusive content, and the latest updates.
Check us out on EarthSnap, a free app brought to you by Eric Ralls and Earth.com.
—–