Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
110 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Entity Anchored ICD Coding (2208.07444v1)

Published 15 Aug 2022 in cs.LG and cs.CL

Abstract: Medical coding is a complex task, requiring assignment of a subset of over 72,000 ICD codes to a patient's notes. Modern natural language processing approaches to these tasks have been challenged by the length of the input and size of the output space. We limit our model inputs to a small window around medical entities found in our documents. From those local contexts, we build contextualized representations of both ICD codes and entities, and aggregate over these representations to form document-level predictions. In contrast to existing methods which use a representation fixed either in size or by codes seen in training, we represent ICD codes by encoding the code description with local context. We discuss metrics appropriate to deploying coding systems in practice. We show that our approach is superior to existing methods in both standard and deployable measures, including performance on rare and unseen codes.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Jay DeYoung (10 papers)
  2. Han-Chin Shing (5 papers)
  3. Luyang Kong (9 papers)
  4. Christopher Winestock (4 papers)
  5. Chaitanya Shivade (11 papers)
Citations (4)

Summary

We haven't generated a summary for this paper yet.