Photo Credit: Artemis Diana
The following is a summary of “Assessing the medical reasoning skills of GPT-4 in complex ophthalmology cases,” published in the February 2024 issue of Ophthalmology by Milad et al.
Researchers conducted a retrospective study to evaluate GPT -4’s Competence in Addressing Queries Related to Elaborate Clinical Ophthalmology Scenarios.
They examined GPT-4’s performance on 422 Journal of the American Medical Association Ophthalmology Clinical Challenges, prompting the model to diagnose (open-ended) and suggest the next step (multiple-choice). To enhance the model’s reasoning, responses were generated through two zero-shot prompting approaches, including zero-shot plan-and-solve+ (PS+). A benchmarking comparison against human graders was conducted to evaluate the top-performing model.
The results showed that using PS+ prompting, GPT-4 achieved mean accuracies of 48.0% (95% CI: 43.1% to 52.9%) for diagnosis and 63.0% (95% CI: 58.2% to 67.6%) for the next steps.
There was no significant difference in next-step accuracy across subspecialties (P=0.44). Diagnostic accuracy in pathology and tumors was higher than in uveitis (P=0.027). When the diagnosis was correct, 75.2% (95% CI: 68.6% to 80.9%) of the subsequent steps were accurate. In cases where the diagnosis was incorrect, 50.2% (95% CI: 43.8% to 56.6%) of the subsequent steps were accurate. Conversely, the likelihood of accuracy in the next step was three times higher when the initial diagnosis was correct (p<0.001). No statistically significant variances were noted in diagnostic accuracy and decision-making between board-certified ophthalmologists and GPT-4. However, among trainees, senior residents exhibited superior diagnostic accuracy (p≤0.001 and 0.049) and next-step accuracy (p=0.002 and 0.020) compared to GPT-4.
Investigators concluded that GPT-4 shows potential for clinical support, improving with tailored prompts but still needing development to meet expert levels in specific areas like ophthalmology.
Source: bjo.bmj.com/content/early/2024/02/16/bjo-2023-325053