Clinical Imaging, cilt.132, 2026 (SCI-Expanded, Scopus)
Purpose Programmed cell death ligand-1 (PD-L1) is a key prognostic and predictive biomarker for immunotherapy in non-small cell lung cancer (NSCLC). This study aimed to develop a machine-learning model using CT-based radiomic features to predict PD-L1 expression status in NSCLC patients. Materials and Methods This retrospective study included 215 patients (mean age, 63.4 ± 9.1 years; range, 36–82 years) with histopathologically confirmed NSCLC and available PD-L1 immunohistochemistry results. Tumors were manually segmented on pretreatment non-contrast CT images, and 230 radiomic features were extracted in accordance with Image Biomarker Standardization Initiative guidelines. Features with >50% missing values were excluded, remaining missing values were imputed by mean, and ComBat harmonization was applied to mitigate inter-scanner variability. Recursive Feature Elimination and SelectFromModel yielded 30 informative predictors. Seven supervised algorithms were tested; Random Forest, XGBoost, and a Hybrid RFE–CatBoost (HRFC) model were retained for detailed comparison. Model performance was assessed by five-fold cross-validation using accuracy, F1-score, and area under the ROC curve (AUC) with 95% confidence intervals (CIs). Results The HRFC model achieved the best performance, with 90.5% accuracy, 88.7% F1-score, and a mean cross-validated AUC of 0.93 (95% CI, 0.89–0.98). XGBoost and Random Forest achieved mean cross-validated AUCs of 0.86 (95% CI, 0.80–0.94) and 0.82 (95% CI, 0.75–0.91), respectively. The HRFC model significantly outperformed Random Forest (ΔAUC = 0.11, p = 0.017), while its difference from XGBoost was not significant. Conclusion The CT-based Hybrid RFE–CatBoost model enables accurate, reproducible prediction of PD-L1 expression in NSCLC, providing a promising noninvasive tool.