Machine learning for predicting the diagnosis of tuberculous versus malignant pleural effusion: External validation and accuracy in two different settings
Author
Arnay del Arco, Rafael
; Garcia Zamalloa, Alberto; Castilla Rodríguez, Iván
; Mar, Javier; González Cava, José Manuel
; Ibarrondo, Oliver; Salegui, Iñaki; Miguel, Juan Antonio de; Mugica, Nekane; Aguinagalde, Borja; Zabaleta, Jon; Basauri, Begoña; Alonso, Marta; Azcue, Nekane; Gil, Eva; Garmendia, Irati; Taboada, Jorge
Date
2025Abstract
Objective
To perform an external validation of a previously reported machine learning (ML) approach for predicting the diagnosis of pleural tuberculosis.
Patients and Methods
We defined two cohorts: a Training group, comprising 273 out of 1,220 effusions from our prospective study (2013–2022); and a Testing group, from a retrospective analysis of 360 effusions from 832 consecutive patients in Bajo Deba health district (1996–2012). All the effusions included were exudative and lymphocytic. In Training and Testing groups respectively, 49 and 104 cases were tuberculous, 143 and 92 were malignant, and 81 and 164 were diagnosed with “other diseases”; pre-test probabilities of pleural tuberculosis were 4% and 12.7%. Variables included were: age, pH, adenosine deaminase, glucose, protein, and lactate dehydrogenase levels, and white cell counts (total and differential) in pleural fluid. We used two ML classifiers: binary (tuberculous and non-tuberculous), and three-class (tuberculous, malignant, and others); and compared them with Bayesian analysis.
Results
The best binary classifier yielded a sensitivity of 88%, specificity of 98%, and accuracy of 95%. The best three-class classifier achieved the same accuracy and correctly classified 83% (77/92) of malignant cases. The ML models yielded higher positive predictive values than Bayesian analysis based on ADA > 40 U/l and lymphocyte percentage ≥ 50% (92%).
Conclusions
This external validation confirms the good performance of the previously reported ML approach for predicting the diagnosis of pleural tuberculosis based on exudative and lymphocytic pleural effusions, and for discriminating the cases most likely to be malignant. Additionally, ML was more accurate than the Bayesian approach in our study.





