Automated Report Generation and QA for Indocyanine Green Angiography Images by ICGA-GPT

Apr 08, 2024

The following is a summary of “ICGA-GPT: report generation and question answering for indocyanine green angiography images,” published in the March 2024 issue of Ophthalmology by Chen et al.

Researchers conducted a retrospective study to develop a bilingual system for automated Indocyanine green angiography (ICGA) report generation and question-answering (QA) to improve diagnostic efficiency.

They utilized 213,129 ICGA images from 2,919 participants. The system consisted of two stages: aligning images and text for report generation using a multimodal transformer architecture and performing QA with ICGA text reports and human-input questions using a large language model (LLM). Performance was evaluated using qualitative metrics such as Bilingual Evaluation Understudy (BLEU), Consensus-based Image Description Evaluation (CIDEr), Recall-Oriented Understudy for Gisting Evaluation-Longest Common Subsequence (ROUGE-L), Semantic Propositional Image Caption Evaluation (SPICE), accuracy, sensitivity, specificity, precision, and F1 score. Additionally, three experienced ophthalmologists subjectively assessed the system’s quality using a 5-point scale, where 5 denoted high quality.

The results showed 8,757 ICGA reports, encompassing 39 disease-related conditions after bilingual translation (66.7% English, 33.3% Chinese). The report generation performance of the ICGA-GPT model was evaluated using BLEU scores (1–4) of 0.48, 0.44, 0.40, and 0.37; CIDEr of 0.82; ROUGE of 0.41, and SPICE of 0.18. Disease-specific metrics revealed an average specificity, accuracy, precision, sensitivity, and F1 score of 0.98, 0.94, 0.70, 0.68, and 0.64, respectively. The assessment of 50 images (100 reports) by three ophthalmologists demonstrated substantial agreement (kappa=0.723 for completeness, kappa=0.738 for accuracy), resulting in scores ranging from 3.20 to 3.55. In a QA session with 100 answers, ophthalmologists scored 4.24, 4.22, and 4.10, showing high consistency (kappa=0.779).

Investigators concluded that their pioneering study established the ICGA-GPT model as the first for automated report generation and interactive QA, highlighting the potential of LLMs to aid in ICGA image interpretation.

Source: bjo.bmj.com/content/early/2024/03/25/bjo-2023-324446