Treatments are limited for patients with relapsed/refractory Diffuse large B-cell lymphoma (DLBCL), and their survival rate is low. Prediction of the recurrence hazard for each patient could provide a reference regarding chemotherapy regimens for clinicians to extend patients’ period of long-term remission. As current strategies cannot satisfy such need, we have established predictive models to classify patients with DLBCL with complete remission who had recurrences in 2 years from ones who did not.
We assessed 518 patients with DLBCL and measured 52 variables of each patient. They were treated between January 2011 and July 2016. 17 variables were first selected by variable selection methods (including Lasso, Adaptive Lasso, and Elastic net). Then, we set classifiers and probability models for imbalanced data by combining the SMOTE sampling, cost-sensitive, and ensemble learning (consisting of AdaBoost, voting strategy, and Stacking) methods with the machine learning methods (Support Vector Machine, BackPropagation Artificial Neural Network, Random Forest), respectively. Last, assessed their performance.
The disease stage and other 5 variables are significant indicators for recurrence. The SVM with AdaBoost ensemble learning method modeling by SMOTE data performs the best (Sensitivity=97.3%, AUC=96%, RMSE=19.6%, G-mean=96%) in all classifiers. The SVM with AdaBoost method(AUC=98.7%, RMSE=17.7%, MXE=12.7%, Cal mean=3.2%, BS0=2.5%, BS1=4%, BSALL=3.1%) and random forest (AUC=99.5%, RMSE=19.8%, MXE=16.2%, Cal mean=9.1%, BS0=4.8%, BS1=2.9%, BSALL=3.9%) both modeling by SMOTE sampling data perform well in probability models.
This predictive model has high accuracy for almost all DLBCL patients and the six indicators can be recurrence signals.

Copyright © 2020 Elsevier B.V. All rights reserved.