Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Hierarchical Deep Learning Ensemble to Automate the Classification of Breast Cancer Pathology Reports by ICD-O Topography (2008.12571v1)

Published 28 Aug 2020 in cs.LG, cs.CV, and eess.IV

Abstract: Like most global cancer registries, the National Cancer Registry in South Africa employs expert human coders to label pathology reports using appropriate International Classification of Disease for Oncology (ICD-O) codes spanning 42 different cancer types. The annotation is extensive for the large volume of cancer pathology reports the registry receives annually from public and private sector institutions. This manual process, coupled with other challenges results in a significant 4-year lag in reporting of annual cancer statistics in South Africa. We present a hierarchical deep learning ensemble method incorporating state of the art convolutional neural network models for the automatic labelling of 2201 de-identified, free text pathology reports, with appropriate ICD-O breast cancer topography codes across 8 classes. Our results show an improvement in primary site classification over the state of the art CNN model by greater than 14% for F1 micro and 55% for F1 macro scores. We demonstrate that the hierarchical deep learning ensemble improves on state-of-the-art models for ICD-O topography classification in comparison to a flat multiclass model for predicting ICD-O topography codes for pathology reports.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Waheeda Saib (3 papers)
  2. David Sengeh (2 papers)
  3. Gcininwe Dlamini (1 paper)
  4. Elvira Singh (2 papers)
Citations (4)

Summary

We haven't generated a summary for this paper yet.