CoTox: Chain-of-Thought Toxicity Framework
- CoTox is a computational framework that integrates chemical and biological data using chain-of-thought reasoning for transparent, organ-specific toxicity predictions in drug development.
- It employs a multi-step LLM-guided inference combining IUPAC chemical descriptors with curated toxicity-related pathways and GO annotations to enhance interpretability.
- Empirical results show CoTox outperforms conventional structure-only models with an average F1-score of ~0.663, supporting its use in early-stage drug safety assessment.
CoTox is a computational framework for molecular toxicity prediction in drug development that integrates LLMs with chain-of-thought (CoT) reasoning. By combining chemical structure representations—specifically, IUPAC names—with curated biological pathway information and gene ontology (GO) annotations, CoTox generates interpretable, organ-specific toxicity predictions. This approach enables stepwise, mechanistically grounded reasoning and transparency, addressing limitations of traditional machine learning models that frequently lack both interpretability and the ability to incorporate complex biological context (Park et al., 5 Aug 2025).
1. Framework Architecture and Inputs
CoTox employs a multimodal input schema which fuses chemical and biological data:
- Chemical Structure Representation: IUPAC names are used in place of SMILES, selected for their descriptive clarity and alignment with LLM language understanding.
- Biological Context: Pathways and toxicity-relevant GO annotations are sourced from the Comparative Toxicogenomics Database (CTD) and further filtered using LLMs (e.g., GPT-4o) to retain only entries relevant to toxicity.
- Prompt Engineering: Customized, multi-step prompts explicitly guide the LLM through stepwise reasoning per toxicity endpoint, enhancing transparency and biological plausibility of predictions.
A simplified procedural outline for each toxicity type is:
- Retrieve biological features (toxicity-related pathways and GO terms) and chemical structure (IUPAC name).
- Reason about (signaling and functional annotations).
- Reason about (toxicophores and chemical motifs).
- Integrate and to produce a mechanistically informed prediction ("Toxic" or "Non-toxic").
- Return a structured JSON output containing a detailed reasoning trace and a binary toxicity decision.
2. Chain-of-Thought Reasoning Process
The central methodological innovation of CoTox is its chain-of-thought prompting strategy, operationalized as a four-step analytical sequence:
- Pathway Analysis: Extraction and analysis of toxicity-related pathways from CTD, focusing only on those with mechanistic relevance as filtered by the LLM.
- GO Term Analysis: Examination of associated GO terms to interpret which biological processes or molecular functions may be perturbed by the compound.
- Chemical Structure Parsing: Identification of structural features, such as functional groups or ring systems, via IUPAC names (using PubChemPy or equivalent tools), highlighting known chemical toxicity motifs.
- Synthesis and Prediction: Integration of biological and chemical insights to generate an interpretable, organ-specific toxicity classification.
Algorithmically, the process is defined as computing for each organ toxicity endpoint, where represents the LLM-driven, multi-step inference.
3. Data Integration and Curation
CoTox's dual-input design maximizes interpretability and explanation fidelity:
- IUPAC vs. SMILES: IUPAC names provide detailed, human-interpretable structural descriptors which facilitate LLM comprehension more effectively than SMILES.
- Biological Annotation Filtering: Initial annotation lists from CTD are pruned by GPT-4o to exclude non-relevant pathways and GO terms, yielding a curated set reflecting plausible mechanisms of toxicity.
- Mechanism-Based Prediction: By integrating both chemical structure and biological context, CoTox moves beyond pattern-matching, supporting predictions that explicitly connect molecular features with toxicity mechanisms.
4. Performance Comparison and Empirical Results
Performance evaluation was conducted on the UniTox dataset encompassing six organ-specific toxicity types. Key results are summarized in the following table:
| Method | Average F1-score | Input Modality | 
|---|---|---|
| CoTox (GPT-4o + IUPAC + Bio) | ~0.663 | IUPAC + Pathways + GO terms | 
| Chemprop | 0.619 | Structure-only (SMILES) | 
| XGBoost | 0.576 | Structure-only (SMILES + descriptors) | 
| Structure-only LLMs | 0.368–0.434 | Structure-only (SMILES/IUPAC) | 
Reasoning-focused LLMs enabled by CoTox prompts (e.g., Gemini-2.5-Pro, GPT-4o) consistently outperform structure-only approaches, with a performance margin of approximately 10–15% in F1-score. This demonstrates that LLMs benefit substantively from enriched, interpretable information through both IUPAC representation and biological contextualization.
5. Application in Drug Development Workflows
CoTox is intended for early-stage drug safety assessment:
- Cell-Type Simulation: Simulates compound treatment in organ-specific cell lines (e.g., HEPG2 for liver, A549 for lung, HA1E for kidney).
- Gene Set Enrichment Analysis (GSEA): Utilizes GSEA-derived pathway and GO term lists reflecting observed or hypothetical cell responses.
- Organ-Specific Contextualization: Prompts for each toxicity endpoint are tailored to these biological contexts, yielding predictions in alignment with specific tissue physiology.
In one application, CoTox was used to model Entecavir-induced effects, providing predictions for liver, pulmonary, and renal toxicity—demonstrating its integration of explicit biological context in risk assessment scenarios.
6. Case Study: Mechanistic Reasoning and Output Structure
A detailed case paper is presented using propranolol and Gemini-2.5-Pro:
- Reasoning Trace: Output includes a JSON structure specifying stepwise logic for each endpoint, referencing pathways such as "Intrinsic Pathway for Apoptosis" and GO terms like "positive regulation of oxidative stress-induced cell death."
- Chemical Analysis: Highlighted IUPAC-derived features (e.g., "propanolamine side chain," "naphthalene ring") are mapped to mechanism—e.g., relating beta-adrenergic antagonism to apoptosis in cardiac tissue, or naphthalene ring metabolism by CYP2E1 to hepatic cellular injury.
- Prediction Transparency: Outputs offer causal, mechanistically relevant justifications, in contrast to black-box predictions of structure-only models.
This demonstrates that CoTox predictions are aligned with physiological understanding and offer rich interpretability supporting regulatory and translational decision-making.
7. Code Accessibility and Reproducibility
Full codebase and prompt definitions used for the CoTox framework are available on GitHub (https://github.com/dmis-lab/CoTox). Public release of source materials facilitates reproducibility and supports adaptation of the method for broader drug safety and toxicological research contexts.
CoTox represents an integrative advance in in silico toxicity assessment, achieving improved predictive accuracy over conventional machine learning and deep learning baselines through the confluence of chemically descriptive language, biologically curated input, and explicit chain-of-thought reasoning. Its utility for early-stage drug development is underscored by transparency and mechanistic insight, enabling informed, physiologically anchored safety predictions (Park et al., 5 Aug 2025).
 
          