Rule-Based Tumor Staging

Updated 1 December 2025

Rule-based tumor staging is a deterministic system that applies clearly defined if–then rules based on TNM criteria to assign cancer stages.
It integrates imaging data, pathology reports, and NLP methods to extract key metrics, ensuring decisions are auditable and reproducible.
The approach enhances clinical trust and regulatory compliance by exposing every step of the decision logic and enabling automated re-staging.

Rule-based tumor staging refers to the deterministic assignment of cancer stages to cases based on patient- or image-derived features using explicit, human-interpretable if–then rules. These systems codify clinical guidelines—most commonly the TNM (Tumor, Node, Metastasis) frameworks or morphometric surrogates—into logical statements, decision tables, or ontological axioms, providing transparent, auditable, and reproducible staging aligned with authoritative criteria. Rule-based approaches contrast with pure statistical or end-to-end deep learning methods by exposing all decision logic, often facilitating both regulatory compliance and clinical trust.

1. Principles of Rule-Based Staging

Rule-based staging calculates cancer stage through a cascade of human-authored, formally specified predicates derived from established clinical protocols. The canonical basis is the TNM system, in which the 'T' category encodes the primary tumor's size and anatomical extent, 'N' the degree of regional lymph node involvement, and 'M' the presence or absence of distant metastases. Stage groups are then mapped from combinations of TNM using decision rules or tables, sometimes augmented by prognostic variables such as hormone receptor status or histological grade (Seneviratne et al., 2018, Moret-Bonillo et al., 2023).

The approach is fundamentally symbolic: explicit rules map measurable input values (e.g., tumor diameter $d$ or number of positive nodes $n$ ) to discrete stage labels, with logic directly traceable back to clinical guidelines. Formally, a rule may be encoded in predicate or description logic, as in

$\text{Stage IA} \equiv (T1 \land N0 \land M0)$

or in decision table form (e.g., see Table 3, (Moret-Bonillo et al., 2023)).

2. Programmable Staging in Imaging and Pathology

Modern rule-based frameworks integrate structured measurement extraction—often via segmentation or information extraction modules—with downstream rule application:

Anatomy-aware segmentation pipelines (lung cancer): Encoder-decoder networks segment CT volumes into tumor, lung parenchyma, mediastinum, and diaphragm masks. Image-derived quantities are computed, such as

$D_{contour} = \max_{i,j\in S_{tumor}} \|x_i - x_j\|_2 \times \Delta x$

and rules for stage assignment are evaluated in a fixed, mutually exclusive order, e.g.:

$\text{If}~(D_{max} > 7.0)~\lor~(d_{\min}^{(\mathrm{mediastinum})}=0)~\lor~(d_{\min}^{(\mathrm{diaphragm})}=0):~T4$

with fallback to T1/T2/T3 according to thresholds. The pseudocode executes exact logic as described in (Chowdhury et al., 24 Nov 2025).

NLP-based rule induction from pathology reports: Staging rules are induced by LLMs using chain-of-thought or retrieval-augmented prompting from free-text reports or external guidelines, producing interpretable, numbered rule sets for subsequent application. Workflow pseudocode and benchmarking are detailed in (Lee et al., 2 Nov 2025).

3. Formalization: Logical and Ontological Frameworks

Ontological rule-based approaches formally encode staging criteria using OWL, Turtle, SPARQL, and description logic, enabling direct mapping of patient records to stage labels through automated reasoning:

OWL/description logic: Each rule is formalized as a class equivalence axiom, e.g.,

$\text{AJCC8\_Stage\_IA} \equiv T1 \land N0 \land M0 \land Grade1 \land HER2^– \land ER^– \land PR^+$

with instance-level assignment automated via SPARQL-based inference agents (Seneviratne et al., 2018).

Automated re-staging: Modular ontologies allow rapid update for new staging editions; on reloading new guidelines as ontologies, previous stage assignments are efficiently replaced across patient cohorts.

4. Methodologies for Measurement Extraction

Rule-based staging requires precise quantification of input features consistent with the logic specification:

Imaging-derived metrics: Extracted from segmentation, these include geometric computations (e.g., maximal in-plane and through-slice tumor diameters, minimal distances to anatomical structures) calculated directly on binary masks (Chowdhury et al., 24 Nov 2025).
Fractal morphometry: In histopathological analysis, features such as the box-counting fractal dimension $D_{f}$ of tissue mass-density images are measured via regression of $\ln N(r)$ against $\ln (1/r)$ , producing stage-specific thresholds validated by statistical analysis (Elkington et al., 2020).
NLP feature extraction: LLMs parse unstructured texts to extract tumor sizes, node counts, or invasion patterns by explicit regex-matching, which are then mapped to staging variables (Lee et al., 2 Nov 2025).

Source	Input Features	Rule Structure
Imaging	Segmentation masks, metrics	Ordered threshold logic
Pathology/NLP	Free-text reports	LLM-induced if–then rules
Knowledge Bases	TNM, biomarkers, grade	Logical axioms (OWL/DL/SPARQL)
Morphometry	Fractal dimension $D_{f}$	Interval-based stage thresholding

5. Representative Rule Systems

5.1. Lung Cancer (T-Stage, Imaging-Based)

Sequentially evaluated rules: $\begin{aligned} &T4: (D_{max} > 7.0) \lor (d_{\min}^{(\mathrm{mediastinum})}=0) \lor (d_{\min}^{(\mathrm{diaphragm})}=0)\ &T1: (D_{max}\leq 3.0) \land (d_{\min}^{(\mathrm{lung})}>0)\ &T2: 3.0 < D_{max}\leq 5.0\ &T3: 5.0 < D_{max}\leq 7.0 \end{aligned}$ with all measurements explicit and thresholds aligned with IASLC/AJCC TNM 8th Edition (Chowdhury et al., 24 Nov 2025).

5.2. Breast Cancer (Structure-Extracted or NLP)

AJCC-guideline-derived rules:

T-stage: If $0 < d \leq 2.0$ cm $\rightarrow$ T1; $2.0 < d \leq 5.0$ cm $\rightarrow$ T2; $d > 5.0$ cm $\rightarrow$ T3; direct skin/chest wall extension $\rightarrow$ T4.
N-stage: No node metastasis $\rightarrow$ N0; 1–3 positive nodes $\rightarrow$ N1; 4–9 nodes $\rightarrow$ N2; $\geq$ 10/level III/internal mammary/supraclavicular involvement $\rightarrow$ N3 (Lee et al., 2 Nov 2025).

5.3. Fractal Dimension Staging

Modal thresholds for $D_{f}$ across cancer types; e.g., for pancreatic cancer,

$D_{f} < 1.6329 \Rightarrow$ Normal
$1.6329 \leq D_{f} < 1.6770 \Rightarrow$ Stage I
$1.6770 \leq D_{f} < 1.7137 \Rightarrow$ Stage II
$D_{f} \geq 1.7137 \Rightarrow$ Stage III

Analogous interval rules apply for breast, colon, prostate cancers (Elkington et al., 2020).

6. Performance Metrics and Interpretability

Rule-based systems report high concordance with expert staging, with recent pipelines achieving overall F1-scores ≥0.9 for multi-class T-stage prediction from imaging (Chowdhury et al., 24 Nov 2025). LLM-induced rule extractors outperform zero-shot and retrieval baselines by 3–6 F1 points on text-based pathology staging (Lee et al., 2 Nov 2025).

A distinguishing property is complete interpretability: explicit, auditable, and modifiable logic supports clinical review, regulatory inspection, and rapid adaptation as guidelines evolve. Reasoning traces, explanations, or links to evidence (e.g., rule–guideline span alignment) are routinely provided, supporting transparency.

7. Applications, Extensions, and Limitations

Rule-based tumor staging underpins clinical decision support, cohort analysis, and protocolization for imaging and digital pathology. Ontology-driven approaches support multi-version guideline migration and complex Boolean rule composition including biomarkers, grade, and other molecular features (Seneviratne et al., 2018). Hybrid classical-quantum pipelines have demonstrated proof-of-concept mappings from TNM to stage within quantum logic circuits (Moret-Bonillo et al., 2023).

Notable current limitations include the challenge of encoding all edge-case, multifocal, or ambiguous cases in rule sets; reliance on precise extraction of requisite features (segmentation or information extraction error propagation); and, in restricted implementations, omission of certain TNM subcategories or anatomic invasions. None of the surveyed frameworks comprehensively cover cases such as separate ipsilateral nodules, pleural effusions, or full biomarker subcategorization unless explicitly encoded.

References

"An Anatomy Aware Hybrid Deep Learning Framework for Lung Cancer Tumor Stage Classification" (Chowdhury et al., 24 Nov 2025)
"Hybrid Classic-Quantum Computing for Staging of Invasive Ductal Carcinoma of Breast" (Moret-Bonillo et al., 2023)
"Knowledge Integration for Disease Characterization: A Breast Cancer Example" (Seneviratne et al., 2018)
"Detection of cancer stages through fractal dimension analysis of tissue microarrays (TMA) via optical transmission microscopy" (Elkington et al., 2020)
"Knowledge Elicitation with LLMs for Interpretable Cancer Stage Identification from Pathology Reports" (Lee et al., 2 Nov 2025)