Letter to the editor: testing the generalizability of DeepPlantAllergy on challenging allergen prediction scenarios


Dolu K. O.

BRIEFINGS IN BIOINFORMATICS, vol.27, no.2, 2026 (SCI-Expanded, Scopus) identifier identifier identifier

  • Publication Type: Article / Letter
  • Volume: 27 Issue: 2
  • Publication Date: 2026
  • Doi Number: 10.1093/bib/bbag149
  • Journal Name: BRIEFINGS IN BIOINFORMATICS
  • Journal Indexes: Science Citation Index Expanded (SCI-EXPANDED), Scopus, BIOSIS, Library, Information Science & Technology Abstracts (LISTA), MEDLINE, Directory of Open Access Journals
  • Istanbul University Affiliated: Yes

Abstract

Dhouib et al. (DeepPlantAllergy: deep learning for explainable prediction of allergenicity in plant proteins. Brief Bioinform 2025;26:bbaf605.) developed DeepPlantAllergy, a deep learning model for predicting allergenicity in plant proteins, reporting area under the receiver operating characteristic curve (ROC-AUC) approximate to 97.7-97.8% on an independent test set. However, the dataset construction may lead to optimistic performance estimates. Specifically, non-allergen sequences sharing >20% identity with allergens were removed before the train/test split, which can reduce the presence of "hard negatives" (moderately similar non-allergens) in the test set and thereby weaken assessment under realistic screening conditions. Because practical allergen screening requires discrimination against large numbers of non-allergens that may share moderate sequence identity, we suggest re-evaluating the model using test sets that retain challenging negatives (with filtering performed against training allergens only) and reporting precision-recall metrics (area under the precision-recall curve) alongside ROC-AUC to better reflect performance under class imbalance.