Washington: The artificial intelligence system scored passing or near passing results on the US medical licensing exam, according to a study published on Thursday.
“Reaching the passing score for this notoriously difficult expert exam, and doing so without any human reinforcement, marks a notable milestone in clinical AI maturation,” said the authors of the study published in the journal PLOS Digital Health.
“These results suggest that large language models may have the potential to assist with medical education, and potentially, clinical decision-making,” they said.
ChatGPT, which is able to produce essays, poems and programming code within seconds, was developed by OpenAI, a California-based startup founded in 2015 with early funding from Elon Musk among others. Microsoft invested $1 billion in OpenAI in 2019 and just inked a new multi-billion deal with the firm.
For the study, researchers at California-based AnsibleHealth tested ChatGPT’s performance on a three-part licensing exam taken by medical students and physicians-in-training in the United States.
The standardized exam tests knowledge in multiple medical disciplines from basic science to biochemistry to diagnostic reasoning to bioethics.
The AI system was tested on 350 of the 376 public questions on the June 2022 version of the exam, the study said, and the chatbot was not given any specialized training ahead of time.
Image-based questions were removed. ChatGPT scored between 52.4 percent and 75 percent across the three parts of the exam. A passing grade is around 60 percent.
According to the study, the first part of the exam, which focuses on basic science and pharmacology, is typically taken by medical students who have put in 300-400 hours of dedicated study time. The second part is generally taken by fourth-year medical students and emphasizes clinical reasoning, medical management and bioethics.
The final section is for physicians who have completed at least six months to a year of postgraduate medical education.
Dr Google and Nurse Bing – The questions were presented to ChatGPT in various formats including open-ended prompting such as “What would be the patient’s diagnosis based on the information provided?”
There were also multiple choice questions such as: “The patient’s condition is mostly caused by which of the following pathogens?”
Two physician adjudicators who were blinded to each other reviewed the responses to come up with the final grades, the study said.
An outside expert, Simon McCallum, a senior lecturer in software engineering at Victoria University of Wellington, New Zealand, noted that Google has received encouraging results with an AI medical tool known as Med-PaLM.
“ChatGPT may pass the exam, but Med-PaLM is able to give advice to patients that is as good as a professional GP,” McCallum said. “And both of these systems are improving.
“Society is about to change, and instead of warning about the hypochondria of randomly searching the internet for symptoms, we may soon get our medical advice from Doctor Google or Nurse Bing.”
ChatGPT also proved useful to the authors of the medical exam study in another way. They used the chatbot to help write it, said co-author Tiffany Kung.