TransICD: Transformer Based Code-wise Attention Model for Explainable ICD Coding (2104.10652v1)

Published 28 Mar 2021 in cs.CL and cs.LG

Abstract: International Classification of Disease (ICD) coding procedure which refers to tagging medical notes with diagnosis codes has been shown to be effective and crucial to the billing system in medical sector. Currently, ICD codes are assigned to a clinical note manually which is likely to cause many errors. Moreover, training skilled coders also requires time and human resources. Therefore, automating the ICD code determination process is an important task. With the advancement of artificial intelligence theory and computational hardware, machine learning approach has emerged as a suitable solution to automate this process. In this project, we apply a transformer-based architecture to capture the interdependence among the tokens of a document and then use a code-wise attention mechanism to learn code-specific representations of the entire document. Finally, they are fed to separate dense layers for corresponding code prediction. Furthermore, to handle the imbalance in the code frequency of clinical datasets, we employ a label distribution aware margin (LDAM) loss function. The experimental results on the MIMIC-III dataset show that our proposed model outperforms other baselines by a significant margin. In particular, our best setting achieves a micro-AUC score of 0.923 compared to 0.868 of bidirectional recurrent neural networks. We also show that by using the code-wise attention mechanism, the model can provide more insights about its prediction, and thus it can support clinicians to make reliable decisions. Our code is available online (https://github.com/biplob1ly/TransICD)

PDF Abstract

The paper "TransICD: Transformer Based Code-wise Attention Model for Explainable ICD Coding" tackles the challenge of automating the assignment of International Classification of Diseases (ICD) codes to clinical notes using a machine learning approach, specifically leveraging transformer-based models. The manual assignment of ICD codes, which are crucial for purposes such as billing and epidemiological studies, can be error-prone and resource-intensive. This necessitates an automated solution that can enhance accuracy and reduce workload.

Key Contributions

Transformer-based Model for ICD Coding:
- The authors propose a novel model architecture, "TransICD", which captures the interdependence among tokens in a clinical document using a transformer encoder. By utilizing a code-wise attention mechanism, the model learns code-specific representations, enabling more precise predictions.
Handling High-dimensional Label Spaces:
- ICD coding is inherently a multi-label classification problem due to the extensive number of possible ICD-9 codes (over 15,000), many of which correspond to rare diseases. The model employs structured self-attention to aggregate token representations into label-specific document representations.
Mitigating Imbalance with LDAM Loss:
- To address the imbalance issue, where many ICD codes appear infrequently, the paper utilizes a Label Distribution Aware Margin (LDAM) loss function. LDAM adjusts the decision boundary by accounting for the label distribution, improving the predictive performance on rarer codes.
Evaluation and Performance:
- The model is evaluated on the MIMIC-III dataset, a benchmark for medical code prediction from clinical records. In empirical tests, TransICD achieved a micro-AUC of 0.923, surpassing baselines like bidirectional recurrent neural networks (Bi-RNNs) which achieved 0.868. The enhancement is attributed to the model's ability to mitigate long-term dependencies and leverage the context-awareness of transformers.
Interpretability via Code-wise Attention:
- Beyond performance metrics, the paper emphasizes interpretability, a crucial aspect for clinical implementation. By providing attention visualizations, TransICD can highlight parts of the text that contribute most significantly to the code predictions, offering clinicians transparency and supplementary insights.

Methodology

Embedding Layer:
- Clinical notes are embedded using pre-trained embeddings, and the document is represented as a sequence of vectors.
Transformer Encoder Layer:
- These embeddings are fed into a multi-headed self-attention-based transformer, producing context-rich token representations capable of handling long sequences effectively.
Attention Mechanism:
- A code-wise attention mechanism computes token-wise attention scores for each code, essentially creating a tailored document representation for each code.
Output Layer:
- Code-specific representations are passed through dense layers to obtain predictions. The LDAM loss is integrated to ensure robustness against imbalanced label distributions.

Results and Analysis

The TransICD model shows notable improvements across several metrics (macro and micro AUC, F1 scores) compared to established baselines like CNN, LSTM, and other attention mechanisms.
An in-depth analysis reveals that the model handles frequent and infrequent codes more effectively, indicating a strong grasp of the underlying distribution and semantics of the dataset, aided by the LDAM loss.
Visual attention maps provide a mechanism to trace prediction decisions back to the input text, facilitating verification and corroboration in a clinical setting.

The paper effectively demonstrates that leveraging transformers, equipped with a code-wise attention mechanism and loss functions designed for imbalanced data, significantly enhances the automation of ICD coding. This approach not only improves predictive performance but also integrates a level of interpretability essential for real-world applications in the medical domain.

PDF Markdown Bookmark Chat (Pro)

Authors (3)

Biplob Biswas (2 papers)
Thai-Hoang Pham (15 papers)
Ping Zhang (436 papers)

Citations (27)

View on Semantic Scholar

TransICD: Transformer Based Code-wise Attention Model for Explainable ICD Coding (2104.10652v1)

Key Contributions

Methodology

Results and Analysis

Related Papers