The paper "TransICD: Transformer Based Code-wise Attention Model for Explainable ICD Coding" tackles the challenge of automating the assignment of International Classification of Diseases (ICD) codes to clinical notes using a machine learning approach, specifically leveraging transformer-based models. The manual assignment of ICD codes, which are crucial for purposes such as billing and epidemiological studies, can be error-prone and resource-intensive. This necessitates an automated solution that can enhance accuracy and reduce workload.
Key Contributions
- Transformer-based Model for ICD Coding:
- The authors propose a novel model architecture, "TransICD", which captures the interdependence among tokens in a clinical document using a transformer encoder. By utilizing a code-wise attention mechanism, the model learns code-specific representations, enabling more precise predictions.
- Handling High-dimensional Label Spaces:
- ICD coding is inherently a multi-label classification problem due to the extensive number of possible ICD-9 codes (over 15,000), many of which correspond to rare diseases. The model employs structured self-attention to aggregate token representations into label-specific document representations.
- Mitigating Imbalance with LDAM Loss:
- To address the imbalance issue, where many ICD codes appear infrequently, the paper utilizes a Label Distribution Aware Margin (LDAM) loss function. LDAM adjusts the decision boundary by accounting for the label distribution, improving the predictive performance on rarer codes.
- Evaluation and Performance:
- The model is evaluated on the MIMIC-III dataset, a benchmark for medical code prediction from clinical records. In empirical tests, TransICD achieved a micro-AUC of 0.923, surpassing baselines like bidirectional recurrent neural networks (Bi-RNNs) which achieved 0.868. The enhancement is attributed to the model's ability to mitigate long-term dependencies and leverage the context-awareness of transformers.
- Interpretability via Code-wise Attention:
- Beyond performance metrics, the paper emphasizes interpretability, a crucial aspect for clinical implementation. By providing attention visualizations, TransICD can highlight parts of the text that contribute most significantly to the code predictions, offering clinicians transparency and supplementary insights.
Methodology
- Embedding Layer:
- Clinical notes are embedded using pre-trained embeddings, and the document is represented as a sequence of vectors.
- Transformer Encoder Layer:
- These embeddings are fed into a multi-headed self-attention-based transformer, producing context-rich token representations capable of handling long sequences effectively.
- Attention Mechanism:
- A code-wise attention mechanism computes token-wise attention scores for each code, essentially creating a tailored document representation for each code.
- Output Layer:
- Code-specific representations are passed through dense layers to obtain predictions. The LDAM loss is integrated to ensure robustness against imbalanced label distributions.
Results and Analysis
- The TransICD model shows notable improvements across several metrics (macro and micro AUC, F1 scores) compared to established baselines like CNN, LSTM, and other attention mechanisms.
- An in-depth analysis reveals that the model handles frequent and infrequent codes more effectively, indicating a strong grasp of the underlying distribution and semantics of the dataset, aided by the LDAM loss.
- Visual attention maps provide a mechanism to trace prediction decisions back to the input text, facilitating verification and corroboration in a clinical setting.
The paper effectively demonstrates that leveraging transformers, equipped with a code-wise attention mechanism and loss functions designed for imbalanced data, significantly enhances the automation of ICD coding. This approach not only improves predictive performance but also integrates a level of interpretability essential for real-world applications in the medical domain.