Letter to the editor: testing the generalizability of DeepPlantAllergy on challenging allergen prediction scenarios


Dolu K. O.

BRIEFINGS IN BIOINFORMATICS, cilt.27, sa.2, 2026 (SCI-Expanded, Scopus) identifier identifier identifier

  • Yayın Türü: Makale / Kısa Makale
  • Cilt numarası: 27 Sayı: 2
  • Basım Tarihi: 2026
  • Doi Numarası: 10.1093/bib/bbag149
  • Dergi Adı: BRIEFINGS IN BIOINFORMATICS
  • Derginin Tarandığı İndeksler: Science Citation Index Expanded (SCI-EXPANDED), Scopus, BIOSIS, Library, Information Science & Technology Abstracts (LISTA), MEDLINE, Directory of Open Access Journals
  • İstanbul Üniversitesi Adresli: Evet

Özet

Dhouib et al. (DeepPlantAllergy: deep learning for explainable prediction of allergenicity in plant proteins. Brief Bioinform 2025;26:bbaf605.) developed DeepPlantAllergy, a deep learning model for predicting allergenicity in plant proteins, reporting area under the receiver operating characteristic curve (ROC-AUC) approximate to 97.7-97.8% on an independent test set. However, the dataset construction may lead to optimistic performance estimates. Specifically, non-allergen sequences sharing >20% identity with allergens were removed before the train/test split, which can reduce the presence of "hard negatives" (moderately similar non-allergens) in the test set and thereby weaken assessment under realistic screening conditions. Because practical allergen screening requires discrimination against large numbers of non-allergens that may share moderate sequence identity, we suggest re-evaluating the model using test sets that retain challenging negatives (with filtering performed against training allergens only) and reporting precision-recall metrics (area under the precision-recall curve) alongside ROC-AUC to better reflect performance under class imbalance.