Systematic Taxonomy of Errors

Updated 2 December 2025

Systematic taxonomy of errors is a comprehensive framework that partitions error phenomena in scientific, computational, and engineering systems into mutually exclusive and collectively exhaustive categories.
It enables precise error attribution by mapping observable errors to defined categories, which facilitates targeted debugging, remediation, and quality assurance.
Its applications span diverse domains—such as machine learning, software engineering, and quantum systems—demonstrating its impact on systematic error reduction.

A systematic taxonomy of errors provides an exhaustive, principled classification of error phenomena in scientific, computational, and engineering systems. Taxonomies of this type enable precise error attribution, targeted debugging, and rational mitigation strategies by mapping every observable error to well-defined, mutually exclusive categories. These frameworks are foundational in many domains—machine learning, software engineering, quantum systems, spreadsheet auditing, linguistic analysis, and physical measurements—allowing practitioners to diagnose, quantify, and ultimately minimize errors at scale.

1. Foundational Principles and Formal Properties

Systematic error taxonomies are grounded in three principles: mutual exclusivity (categories do not overlap), collective exhaustiveness (every error instance is assigned to a category), and operational applicability (categories admit diagnostic rules and remediation strategies). Modern taxonomies often specify formal tests, mathematical criteria, or set-theoretic definitions for each category, facilitating both algorithmic implementation and empirical validation.

For example, in predictive classification, error types are defined relative to a specific tuple (training set $T$ , feature set $F$ , learner $L$ ) and operationalized by the behavior of the output classifier $c=L(F,T)$ against the target classifier $c^*$ (Meek, 2016):

Every prediction error $c(x) \neq c^*(x)$ is either a mislabeling, learner, representation, or boundary error.
Propositions establish that these types are mutually exclusive and jointly exhaustive for all prediction errors.

2. Core Error Categories Across Domains

Taxonomies in disparate fields frequently converge on several core error classes, though domain-specific refinements are common. Representative examples include:

Prediction Error Taxonomy (ML/AI):

Mislabeling Error: Incorrect training label ( $y \neq c^*(x)$ ), arising from human or automated annotation mistakes.
Representation Error: No classifier in the hypothesis class can fit the training set; indicates feature or model class insufficiency.
Learner Error: A fitting classifier exists in principle, but $L$ fails to find it (due to optimization or objective mis-specification).
Boundary Error: A generalization mistake at $x\notin T_X$ that disappears if the labeled instance is added to $T$ ; typically due to sparse sampling near decision boundaries. These types are distinguished by formal conditions on the relationship between $c=L(F,T)$ , $C(F,L)$ , the target $c^*$ , and the data distribution (Meek, 2016).

Spreadsheet Error Taxonomies:
- Panko–Halverson: Separates Violations (intentional policy breaches) from Errors (unintentional), further subdividing errors into Qualitative (hard-coding, layout, maintainability flaws) and Quantitative (mistakes, slips, lapses), indexed by lifecycle stage and context level (0809.3613).
- Rajalingham–Chadwick–Knight: Binary tree with System-Generated vs User-Generated, then by Quantitative/Qualitative, with fine-grained subcategories (e.g., omission, alteration, duplication, semantic, temporal, maintainability errors) (0805.4224).
- Asheetoxy: Organizes observable negative spreadsheet phenomena into Wrong Actions, Anomalies, Defects (imperfections/faults), Failures, and Problems, establishing a causally-ordered, phenomenon-oriented typology (Kulesz et al., 2018).
Prompt Defect Taxonomy (LLM Systems):
- Six superordinate axes: Specification/Intent, Input/Content, Structure/Formatting, Context/Memory, Performance/Efficiency, Maintainability/Engineering, each with fine-grained defect classes paired with downstream impacts and mitigation strategies (Tian et al., 17 Sep 2025).
Grammatical Error Taxonomies:
- Rigorous classification schemes are evaluated by exclusivity, coverage, balance, and usability, with the best-performing structures organizing errors by word-level (spelling/orthography), inter-word syntax/morphology, and discourse-level (punctuation, tense, sentence structure) (Zou et al., 17 Feb 2025).
HPC I/O Modeling Error Taxonomy:
- Five additive sources: Application modeling ( $e_{\rm app}$ ), Global system ( $e_{\rm system}$ ), Generalization/OoD ( $e_{\rm OoD}$ ), I/O contention ( $e_{\rm contention}$ ), and Irreducible noise ( $e_{\rm noise}$ ), each with explicit quantification guidelines and litmus tests (Isakov et al., 2022).
Physical Sciences Example (MICROSCOPE/WEP Test):
- Systematic errors partitioned into External perturbations (e.g., gravity gradient, drag), Satellite-design perturbations (thermal, magnetic, self-gravity), and Instrument-internal sources (off-centring, scale-factor drift, non-linearities), each modeled mathematically to bound the Eötvös parameter uncertainty (Rodrigues et al., 2021).

3. Diagnostic Methodologies and Quantitative Tools

Diagnosis within systematic error taxonomies proceeds via structured sequences, often algorithmic:

Detection/Auditing: Compute errors on training and test data; form minimal invalidation sets for training mistakes (Meek, 2016).
Source Attribution: Identify whether errors are due to labeling (mislabeling), model capacity (representation), algorithm optimization (learner), or insufficient boundary data (boundary).
Remediation Sequencing: Audit/correct labels (for mislabeling), engineer features/hypotheses (for representation), tune or exchange learners (for learner), or perform active sampling (for boundary).
Recursion: Iterate diagnosis–repair until only generalization errors remain, then focus on boundary or representation refinement.

In HPC I/O, sequential litmus tests are defined for each error component (duplicate-job mean predictor for $e_{\rm app}$ , time-feature and system logs for $e_{\rm system}$ , predictive ensemble disagreement for $e_{\rm OoD}$ , and so forth), with direct correspondences to observed error reduction when each component is addressed (Isakov et al., 2022).

4. Domain-Specific Extensions and Adaptations

Systematic taxonomies are explicitly adapted to new technical domains as system complexity increases:

Attention-Based Neural Networks: Seven new attention-specific fault classes (masking, QKV projection, kernel integration, score computation, positional encoding, KV cache, variant selection) were identified as comprising over half of observed ABNN faults, with high-confidence diagnostic heuristics for key categories (Jahan et al., 6 Aug 2025).
Hybrid Quantum–Classical Architectures: Errors are mapped to the step in the workflow (parametrization, conceptualization, API, optimization, quantum circuit, measurement, GPU), then to subcategory by root cause and symptom, enabling targeted quality assurance and systematic dataset construction (Bensoussan et al., 12 Feb 2025).
Visual Reasoning in Radiology: AI errors are classified as Perceptual (under-detection, over-detection, mislocalization), Interpretive (misattribution, premature closure), Communication (findings–summary discordance), with cognitive-bias modifiers (confirmation, anchoring, inattentional, framing) enhancing the explanatory granularity (Datta et al., 29 Sep 2025).

Tables are often used to summarize the mapping from error type to canonical signature, main sources, and preferred remedies:

Error Type	Signature	Main Sources	Remedies/Tools
Mislabeling	$\exists (x,y)\in T: y\neq c^*(x)$	Label mistake, ambiguity	Invalidation set, audit
Learner	$\exists c\in C(F,L)$ fits $T$ , but $L(F,T)\neq c$	Optimization failure, objective	Consistent learner, hyperparam sweep
Representation	$\forall c\in C(F,L),\exists (x,y)\in T: c(x)\neq y$	Missing features, restricted model	Feature engineering, richer models
Boundary	$c(x)\neq c^(x)$ , corrected by $T\cup(x, c^(x))$	Sparse sampling near boundary	Active/uncertainty sampling

(Meek, 2016)

5. Empirical Performance, Validation, and Generalization

Taxonomies are rigorously validated both through controlled experiments (e.g., inter-rater reliability on spreadsheet errors (0809.3613), human annotation with Cohen’s $\kappa$ for grammatical errors (Zou et al., 17 Feb 2025)) and via deployment on large-scale or high-impact cases (frontier radiology AI benchmarks (Datta et al., 29 Sep 2025), benchmarking LLM prompt failure in industry (Tian et al., 17 Sep 2025)). Exception rates, category balance, and detection reliability (e.g., error-classification accuracy, coverage, exclusivity) are systematically reported.

Cross-domain analogues are explicitly acknowledged:

Violations/Errors: spreadsheet auditing (0809.3613), medical protocols, software standards.
Quantitative/Qualitative: functional vs. non-functional bugs in software testing.
Mistakes/Slips/Lapses: Reason (1990) tripartite cognitive-error model, found in human factors, linguistics, data entry, and HCI.

Generalization requires mapping domain-specific primitives (e.g. “spreadsheet implementation” vs “database query syntax”) while preserving the structural taxonomy criteria (0805.4224).

6. Impact and Forward Trajectories

Taxonomies of systematic errors have transformed both theoretical understanding and practical system development:

Enable root-cause analysis and precise error attribution, supporting targeted mitigation and debugging.
Benchmark error-detection and correction algorithms, setting measurable standards for model and system improvement.
Guide dataset construction, annotation protocols, and quality assurance across scientific and engineering workflows.
Underpin the design of diagnostic tools (e.g., auto-evaluation frameworks for RAG systems (Leung et al., 15 Oct 2025), error detection in ATS (Vendeville et al., 22 May 2025), or CI/CD hooks for LLM prompt validation (Tian et al., 17 Sep 2025)).

Continuing challenges include coverage extension to evolving system modalities (e.g., multimodal neural architectures, hybrid quantum–classical platforms), automation of error-type detection (especially for latent or non-observable failures), and the creation of dynamically versioned, adaptive taxonomies integrated with system lifecycles.

By enforcing rigorous categorization and operational definitions, systematic taxonomies of errors enable both fine-grained scientific inquiry and robust engineering for complex systems across domains.