Assessing robustness and generalization of a deep neural network for brain MS lesion segmentation on real-world data

Chaves, Hernán; Serra, María Mercedes; Shalom, Diego E.; Ananía, Pilar; Rueda, Fernanda; Osa Sanz, Emilia; Stefanoff, Nadia Ivanna; Rodríguez Murúa, Sofía; Costa, Martín Elías; Kitamura, Felipe C.; Yañez, Paulina; Cejas, Claudia Patricia; Correale, Jorge; Ferrante, Enzo; Fernández Slezak, Diego; Farez, Mauricio Franco

Assessing robustness and generalization of a deep neural network for brain MS lesion segmentation on real-world data

Chaves, Hernán; Serra, María Mercedes; Shalom, Diego E.; Ananía, Pilar; Rueda, Fernanda; Osa Sanz, Emilia; Stefanoff, Nadia Ivanna; Rodríguez Murúa, Sofía; Costa, Martín Elías; Kitamura, Felipe C.; Yañez, Paulina; Cejas, Claudia Patricia; Correale, Jorge; Ferrante, Enzo; Fernández Slezak, Diego; Farez, Mauricio Franco

URI: https://doi.org/10.1007/s00330-023-10093-5
https://repositorio.fleni.org.ar/xmlui/handle/123456789/1012

Date: 2023-08-31

Abstract:

Objectives: Evaluate the performance of a deep learning (DL)-based model for multiple sclerosis (MS) lesion segmentation and compare it to other DL and non-DL algorithms. Methods: This ambispective, multicenter study assessed the performance of a DL-based model for MS lesion segmentation and compared it to alternative DL- and non-DL-based methods. Models were tested on internal (n = 20) and external (n = 18) datasets from Latin America, and on an external dataset from Europe (n = 49). We also examined robustness by rescanning six patients (n = 6) from our MS clinical cohort. Moreover, we studied inter-human annotator agreement and discussed our findings in light of these results. Performance and robustness were assessed using intraclass correlation coefficient (ICC), Dice coefficient (DC), and coefficient of variation (CV). Results: Inter-human ICC ranged from 0.89 to 0.95, while spatial agreement among annotators showed a median DC of 0.63. Using expert manual segmentations as ground truth, our DL model achieved a median DC of 0.73 on the internal, 0.66 on the external, and 0.70 on the challenge datasets. The performance of our DL model exceeded that of the alternative algorithms on all datasets. In the robustness experiment, our DL model also achieved higher DC (ranging from 0.82 to 0.90) and lower CV (ranging from 0.7 to 7.9%) when compared to the alternative methods. Conclusion: Our DL-based model outperformed alternative methods for brain MS lesion segmentation. The model also proved to generalize well on unseen data and has a robust performance and low processing times both on real-world and challenge-based data. Clinical relevance statement: Our DL-based model demonstrated superior performance in accurately segmenting brain MS lesions compared to alternative methods, indicating its potential for clinical application with improved accuracy, robustness, and efficiency. Key points: • Automated lesion load quantification in MS patients is valuable; however, more accurate methods are still necessary. • A novel deep learning model outperformed alternative MS lesion segmentation methods on multisite datasets. • Deep learning models are particularly suitable for MS lesion segmentation in clinical scenarios.

Show full item record