Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
60 tokens/sec
GPT-4o
12 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
5 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Text2Tree: Aligning Text Representation to the Label Tree Hierarchy for Imbalanced Medical Classification (2311.16650v1)

Published 28 Nov 2023 in cs.CL

Abstract: Deep learning approaches exhibit promising performances on various text tasks. However, they are still struggling on medical text classification since samples are often extremely imbalanced and scarce. Different from existing mainstream approaches that focus on supplementary semantics with external medical information, this paper aims to rethink the data challenges in medical texts and present a novel framework-agnostic algorithm called Text2Tree that only utilizes internal label hierarchy in training deep learning models. We embed the ICD code tree structure of labels into cascade attention modules for learning hierarchy-aware label representations. Two new learning schemes, Similarity Surrogate Learning (SSL) and Dissimilarity Mixup Learning (DML), are devised to boost text classification by reusing and distinguishing samples of other labels following the label representation hierarchy, respectively. Experiments on authoritative public datasets and real-world medical records show that our approach stably achieves superior performances over classical and advanced imbalanced classification methods.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (60)
  1. Tian Bai and Slobodan Vucetic. 2019. Improving medical code prediction from clinical text via incorporating online knowledge sources. In WWW, pages 72–82.
  2. Hierarchical transfer learning for multi-label text classification. In ACL, pages 6295–6300.
  3. Estimating or propagating gradients through stochastic neurons for conditional computation. arXiv preprint arXiv:1308.3432.
  4. Eta S Berner. 2007. Clinical Decision Support Systems, volume 233. Springer.
  5. HyperCore: Hyperbolic and co-graph representation for automatic ICD coding. In ACL, pages 3105–3114.
  6. Active bias: Training more accurate neural networks by emphasizing high variance samples. In NeurIPS.
  7. SMOTE: Synthetic minority over-sampling technique. Journal of Artificial Intelligence Research, 16:321–357.
  8. Hierarchy-aware label semantics matching network for hierarchical text classification. In ACL, pages 4370–4379.
  9. MixText: Linguistically-informed interpolation of hidden space for semi-supervised text classification. In ACL, pages 2147–2157.
  10. Towards interpretable clinical diagnosis with Bayesian network ensembles stacked on entity-aware CNNs. In ACL, pages 3143–3153.
  11. A simple framework for contrastive learning of visual representations. In ICML, pages 1597–1607. PMLR.
  12. Yuwen Chen and Jiangtao Ren. 2019. Automatic ICD code assignment utilizing textual descriptions and hierarchical structure of ICD code. In BIBM, pages 348–353. IEEE.
  13. Feature space augmentation for long-tailed data. In ECCV, pages 694–710. Springer.
  14. Large scale fine-grained categorization and domain-specific transfer learning. In CVPR, pages 4109–4118.
  15. Supervised contrastive learning for pre-trained language model fine-tuning. In ICLR.
  16. Momentum contrast for unsupervised visual representation learning. In CVPR, pages 9729–9738.
  17. Balancing methods for multi-label text classification with long-tailed class distribution. In EMNLP, pages 8153–8161.
  18. Exploiting global and local hierarchies for hierarchical text classification. In EMNLP, pages 4030–4039.
  19. MIMIC-III, a freely accessible critical care database. Scientific Data, 3(1):1–9.
  20. Exploring balanced feature spaces for representation learning. In ICLR.
  21. Domain-aware contrastive knowledge transfer for multi-domain imbalanced data. In WASSA, pages 25–36.
  22. An empirical study of learning from imbalanced data using random forest. In ICTAI, volume 2, pages 310–317. IEEE.
  23. Supervised contrastive learning. In NeurIPS.
  24. Self-paced learning for latent variable models. In NeurIPS.
  25. Dice loss for data-imbalanced NLP tasks. In ACL, pages 465–476.
  26. Focal loss for dense object detection. In ICCV, pages 2980–2988.
  27. Self-paced ensemble for highly imbalanced massive data classification. In ICDE, pages 841–852. IEEE.
  28. Large-scale long-tailed recognition in an open world. In CVPR, pages 2537–2546.
  29. Ilya Loshchilov and Frank Hutter. 2018. Decoupled weight decay regularization. In ICLR.
  30. BAGAN: Data augmentation with balancing GAN. arXiv preprint arXiv:1803.09655.
  31. Invasive aspergillosis in the intensive care unit. Clinical Infectious Diseases, 45(2):205–216.
  32. Giovanna Menardi et al. 2014. Training and assessing classification rules with imbalanced data. Data Mining and Knowledge Discovery, 28:92–122.
  33. Delaying the empiric treatment of Candida bloodstream infection until positive blood culture results are obtained: A potential risk factor for hospital mortality. Antimicrobial Agents and Chemotherapy, 49(9):3640–3645.
  34. Explainable prediction of medical codes from clinical text. In NAACL, pages 1101–1111. Association for Computational Linguistics (ACL).
  35. Condensed memory networks for clinical diagnostic inferencing. In AAAI.
  36. HFT-CNN: Learning hierarchical category structure for multi-label short text categorization. In EMNLP, pages 811–816.
  37. Mixup-Transformer: Dynamic data augmentation for NLP tasks. In COLING, pages 3436–3440.
  38. Classification of imbalanced data: A review. IJPRAI, 23(04):687–719.
  39. Modeling diagnostic label correlation for automatic ICD coding. In NAACL, pages 4043–4052.
  40. A label attention model for ICD coding from clinical text. In IJCAI, pages 3335–3341.
  41. Calibrating imbalanced classifiers with focal loss: An empirical study. In EMNLP, pages 145–153.
  42. Contrastive learning based hybrid networks for long-tailed image classification. In CVPR, pages 943–952.
  43. A novel framework based on medical concept driven attention for explainable medical code prediction via external knowledge. In ACL, pages 1407–1416.
  44. HPT: hierarchy-aware prompt tuning for hierarchical text classification. In EMNLP, pages 3740–3751.
  45. Incorporating hierarchy into text encoder: A contrastive learning approach for hierarchical text classification. In ACL, pages 7109–7119.
  46. Hierarchical multi-label classification networks. In ICML, pages 5075–5084.
  47. EHR coding with multi-scale feature attention and structured knowledge graph propagation. In CIKM, pages 649–658.
  48. T2G-Former: Organizing tabular features into relation graphs promotes heterogeneous feature interaction. In AAAI.
  49. Knowledge injected prompt based fine-tuning for multi-label few-shot ICD coding. arXiv preprint arXiv:2210.03304.
  50. Do Transformers really perform badly for graph representation? In NeurIPS.
  51. Code synonyms do matter: Multiple synonyms matching network for automatic ICD coding. In ACL, pages 808–814.
  52. Pure noise to the rescue of insufficient data: Improving imbalanced classification by training on random noise images. In ICML, pages 25817–25833. PMLR.
  53. Making pretrained language models good long-tailed learners. In EMNLP, pages 3298–3312.
  54. Mixup: Beyond empirical risk minimization. In ICLR.
  55. How does Mixup help with robustness and generalization? In ICLR.
  56. Deep long-tailed learning: A survey. TPAMI.
  57. Hierarchy-aware global model for hierarchical text classification. In ACL, pages 1106–1117.
  58. Automatic ICD coding via interactive shared representation networks with self-distillation mechanism. In ACL, pages 5948–5957.
  59. Inflated episodic memory with region self-attention for long-tailed visual recognition. In CVPR, pages 4344–4353.
  60. Medical coding with biomedical Transformer ensembles and zero/few-shot learning. In NAACL, pages 176–187.
Citations (3)

Summary

We haven't generated a summary for this paper yet.