DeepGB-TB: A Risk-Balanced Cross-Attention Gradient-Boosted Convolutional Network for Rapid, Interpretable Tuberculosis Screening

Published 2 Aug 2025 in cs.LG, cs.AI, cs.SD, and eess.AS | (2508.02741v1)

Abstract: Large-scale tuberculosis (TB) screening is limited by the high cost and operational complexity of traditional diagnostics, creating a need for artificial-intelligence solutions. We propose DeepGB-TB, a non-invasive system that instantly assigns TB risk scores using only cough audio and basic demographic data. The model couples a lightweight one-dimensional convolutional neural network for audio processing with a gradient-boosted decision tree for tabular features. Its principal innovation is a Cross-Modal Bidirectional Cross-Attention module (CM-BCA) that iteratively exchanges salient cues between modalities, emulating the way clinicians integrate symptoms and risk factors. To meet the clinical priority of minimizing missed cases, we design a Tuberculosis Risk-Balanced Loss (TRBL) that places stronger penalties on false-negative predictions, thereby reducing high-risk misclassifications. DeepGB-TB is evaluated on a diverse dataset of 1,105 patients collected across seven countries, achieving an AUROC of 0.903 and an F1-score of 0.851, representing a new state of the art. Its computational efficiency enables real-time, offline inference directly on common mobile devices, making it ideal for low-resource settings. Importantly, the system produces clinically validated explanations that promote trust and adoption by frontline health workers. By coupling AI innovation with public-health requirements for speed, affordability, and reliability, DeepGB-TB offers a tool for advancing global TB control.

Abstract PDF Upgrade to Chat

Authors (8)

Summary

The paper introduces DeepGB-TB, a framework that combines a 1D-CNN for cough audio with a LightGBM model for demographic data to enhance TB screening.
It employs a cross-modal bidirectional attention mechanism (CM-BCA) to iteratively refine audio and demographic features, mimicking clinical reasoning.
Experimental results demonstrate an AUROC of 0.903, underscoring its high sensitivity and on-device applicability for rapid TB diagnosis.

DeepGB-TB: A Risk-Balanced Cross-Attention Gradient-Boosted Convolutional Network for Rapid, Interpretable Tuberculosis Screening

DeepGB-TB proposes a novel framework combining deep learning with gradient-boosted decision trees to offer an efficient, interpretable method for tuberculosis (TB) screening. It utilizes acoustic features from cough audio coupled with demographic data to predict TB risk. The architecture features a Cross-Modal Bidirectional Cross-Attention (CM-BCA) mechanism that enables data from cough audio to iteratively interact with demographic features, thereby refining diagnostic interpretations in a manner akin to a clinician’s diagnostic reasoning.

Methodology

DeepGB-TB integrates a 1D-CNN for processing cough audio and a LightGBM-based model for demographic data into a Cross-Validated Probability Embedding Module (CVPEM). The CVPEM transforms raw tabular data into robust, high-dimensional vectors by employing cross-validated LightGBM predictions as features.

1D-CNN and CVPEM Integration:

Figure 1: The architecture of DeepGB-TB.

The audio processing employs Mel-frequency cepstral coefficients (MFCCs) and supplementary features (chroma, spectral centroid) as inputs to a 1D-CNN, while a LightGBM model processes demographic data using cross-validation to yield a stable probability embedding. This configuration captures the multiplicative interplay between the modalities, effectively modeling patient profiles against audio signatures.

CM-BCA Mechanism:

CM-BCA iteratively refines the audio and tabular feature representations by cross-attending over the alternate data modality. This process, shown in (Figure 2) as a series of multi-head attention layers, relies on successive applications of attention and feed-forward layers interleaved with layer normalization. The CM-BCA provides an efficient fusion that supports the holistic assessment of patient data.

Figure 2: The process of CM-BCA.

Loss Function and Optimization:

To address the critical need for high sensitivity in TB detection, a Tuberculosis Risk-Balanced Loss (TRBL) is incorporated. This loss assigns a higher penalty to false negatives, optimizing the model for recall. The selected value of the hyperparameter $\lambda = 3$ in the TRBL ensures an optimal balance between sensitivity and specificity as shown in empirical evaluations.

Experimental Results

DeepGB-TB was evaluated on a dataset comprising 1,105 participants from diverse geographic locations, featuring a balanced TB and non-TB distribution. The method achieved an impressive AUROC of 0.903, indicating robust diagnostic capability surpassing other baseline models.

Figure 3: Comparison of Model Training and Validation Loss. The x-axis denotes epochs for DeepGB-TB and training steps for Qwen-Omni.

The ablation studies highlight the significance of each modular innovation in DeepGB-TB. Exclusion of the CM-BCA led to a significant AUROC decrease of 1.3%, underscoring its importance. On-device performance tests further validated the model's applicability in real-time settings, showing efficient processing capabilities on edge devices.

Discussion and Implications

DeepGB-TB exemplifies a tailored AI solution that effectively integrates multimodal data, achieving state-of-the-art TB screening potential through smart architectural choices like CM-BCA and TRBL. While the current results indicate high accuracy and efficiency, further prospective studies could explore the model's application in various settings to cement its clinical utility.

Figure 4: Attention heatmap over input features.

Given its design, the framework stands as a promising tool in overcoming the barriers to accessible TB diagnosis, especially in resource-limited settings. Future enhancements might consider expanding the dataset diversity or integrating additional modalities to bolster its diagnostic robustness.

Conclusion

The DeepGB-TB framework highlights the power of combining convolutional neural networks with decision trees and innovative fusion mechanisms, offering a compelling solution for rapid, mobile-based tuberculosis screening. Despite the methodology's inherent strengths, ongoing validation across broader datasets and settings remains essential to fully realize its potential for widespread clinical deployment.

Markdown Report Issue