SenseCF: LLM-Driven Counterfactuals

Updated 28 January 2026

SenseCF is a framework that generates minimally invasive counterfactual explanations using fine-tuned LLMs to improve clinical intervention design and predictive model performance.
It employs a unified pipeline for counterfactual intervention and sample-level data augmentation, validated by metrics such as 0.99 validity and low feature perturbations.
Beyond digital health, SenseCF also encompasses scalable cell-free ISAC and decision-aware synchronization, highlighting its adaptability in integrated communication and sensing environments.

SenseCF, in its most common usage, refers to a framework for generating minimally invasive, clinically actionable counterfactual explanations using fine-tuned LLMs for structured sensor-derived health data, and employing these counterfactuals for augmented data synthesis to improve robustness and performance of downstream predictive models. It operates as a unified pipeline for both counterfactual intervention design and sample-level data augmentation. Across related literature, the term “SenseCF” is also used to denote scalable and distributed cell-free integrated sensing and communication (CF-ISAC) frameworks, and decision-aware semantic state synchronization protocols in compute-first networking; however, its defining contribution lies in the LLM-driven counterfactual modeling paradigm for sensor-based digital health (Soumma et al., 21 Jan 2026, &&&1&&&).

1. Formal Foundations of Counterfactual Generation

SenseCF addresses the counterfactual generation problem for sensor-derived tabular instances. Given a set of factual examples $X = \{x_i\}_{i=1}^N \subset \mathbb{R}^d$ and corresponding labels $Y = \{y_i\}_{i=1}^N \subset \{0,1\}$ , a black-box classifier $f: \mathbb{R}^d \to \{0,1\}$ produces predictions $\hat y_i = f(x_i)$ . The counterfactual $x_i'$ for an instance $x_i$ is any perturbation whose prediction is flipped to $y_{\mathrm{target}} = 1 - \hat y_i$ , subject to three desiderata:

Minimality: The perturbation distance $d(x_i, x_i') = \|x_{\mathrm{cont}} - x'_{\mathrm{cont}}\|_2 + \mathrm{Ham}(x_{\mathrm{cat}}, x'_{\mathrm{cat}})$ is minimized.
Validity: $f(x_i') = y_{\mathrm{target}}$ .
Plausibility: $x_i'$ must conform to empirical data support $P_{\mathrm{data}}(x) \geq \tau$ and immutable feature constraints (e.g., age, sex).

The optimization becomes:

$x_i'{}^* = \arg\min_{x'} d(x_i, x') \quad \text{s.t.}\quad f(x') = y_{\mathrm{target}},\; P_{\mathrm{data}}(x') \geq \tau,\; x'_j = x_j\;\forall j \in \mathcal{I}$

where $\mathcal{I}$ indexes immutable features (Soumma et al., 21 Jan 2026).

2. Model Architecture and Fine-Tuning Regimen

SenseCF’s architecture fine-tunes an open-source LLM (e.g., LLaMA-3.1-8B) for structured-feature counterfactual generation. Parameter-efficient LoRA adapters are used, with the base model quantized to 4-bit (NF4). Fine-tuning is performed over two epochs, batch size 16, using A100 GPUs.

Each prompt encodes the factual vector $x_i$ , classifier prediction $\hat y_i$ , and immutability constraints; the LLM is tasked to output a counterfactual $x_i'$ sequence. The composite loss combines:

Counterfactual likelihood: $-\mathbb{E}_{(x_i, y_i) \sim \mathcal{D}_{\mathrm{train}}} \log P_\theta(x_i' | \text{prompt}(x_i, \hat y_i))$
Plausibility regularizer: Penalty for outputs violating empirical boundaries, $\mathbb{E}_{x_i'}[\mathbf{1}(P_{\mathrm{data}}(x_i') < \tau)]$
Distance penalty: Optional, using $d(x_i,x_i')$

The total loss is:

$\mathcal{L}_\text{total} = \mathcal{L}_\mathrm{cf} + \lambda_\mathrm{plaus} \mathcal{L}_\mathrm{plaus} + \lambda_\mathrm{dist} \mathcal{L}_\mathrm{dist}$

with $\lambda$ hyperparameters validated empirically (Soumma et al., 21 Jan 2026).

Data preprocessing is applied to features extracted from sensor time series (e.g., %TIR, hyperglycemic event counts, sleep-stage percentages, steps), using windowed summary statistics and serialization for LLM input (Soumma et al., 21 Jan 2026, Soumma et al., 7 Jul 2025).

3. Evaluation Metrics and Comparative Performance

SenseCF utilizes four principal metrics for counterfactual evaluation:

Metric	Definition/Computation
Validity	$\frac{1}{\lvert \mathrm{CF} \rvert} \sum_{x' \in \mathrm{CF}} \mathbf{1}(f(x') = y_{\mathrm{target}})$
Sparsity	$\frac{1}{\lvert \mathrm{CF} \rvert} \sum_{x' \in \mathrm{CF}} \sum_{j = 1}^d \mathbf{1}(x'_j \neq x_j)$
Distance	Normalized $L_2$ /Hamming metric as above; lower is better
Plausibility	$\frac{1}{\lvert \mathrm{CF} \rvert} \sum_{x' \in \mathrm{CF}} \mathbf{1}(P_{\mathrm{data}}(x') \geq \tau)$

Additional analysis includes per-feature coefficient of variation and manifold overlap via clustering projections (Soumma et al., 21 Jan 2026, Soumma et al., 7 Jul 2025).

On AI-READI data, SenseCF (fine-tuned LLaMA-3.1-8B) achieves near-perfect validity (0.99), low average feature changes (1.8–1.9), minimal distance ( $\sim0.2-0.4$ ), and 99% plausibility, outperforming DiCE, CFNOW, NICE, and all zero-shot LLM baselines. In scarcity-driven augmentation scenarios, SenseCF provides a mean F1-score recovery of $+22.4\%$ (positive-scarcity), $+16.4\%$ (negative-scarcity), and $+20.0\%$ (dual scarcity), versus $8$– $12\%$ for optimization-based methods (Soumma et al., 21 Jan 2026).

In a comparative study using GPT-4o-mini, three-shot prompting achieves 0.99 validity, 0.99 plausibility, and competitive sparsity/feature-distance compared to existing optimization solvers (Soumma et al., 7 Jul 2025).

4. Data Augmentation via Counterfactuals

SenseCF counterfactuals that pass validity and plausibility checks are assigned the target label and injected into the training set:

$X_\mathrm{train}^\mathrm{new} = X_\mathrm{train} \cup \mathcal{X}_\mathrm{aug}$

where $\mathcal{X}_\mathrm{aug} = \{ (x_i', y_i') | f(x_i') = y_\mathrm{target} \}$ (Soumma et al., 21 Jan 2026).

The augmentation protocol allows varying the ratio of synthetic minority samples between 20% and 100% of the original count, enabling controlled exploration of impact on downstream classifier performance. Empirically, consistent gains in classifier F1 and accuracy under strong label imbalance and severe data scarcity are observed, with LLM-based CFs outperforming existing solvers both in intervention utility and augmentation effectiveness (Soumma et al., 21 Jan 2026, Soumma et al., 7 Jul 2025).

Practical details include prompt serialization with explicit immutability constraints, batch generation on high-throughput hardware, and strict separation of augmented and test data to avoid leakage (Soumma et al., 7 Jul 2025).

5. Application Contexts and Scalability

SenseCF’s primary deployment is in digital health AI workflows, where it enables:

Personalized intervention design: CFs yield specific, classifier-validated modifications in modifiable physiological or behavioral features (e.g., raising deep sleep %, reducing average glucose, increasing steps) with demonstrated outcome reversal.
Robust data augmentation for imbalanced tasks: LLM-generated CFs supplement minority-class distribution, restoring classifier performance with dramatically less dependence on expensive data collection (Soumma et al., 21 Jan 2026).

In broader networked and sensor communication contexts, “SenseCF” also designates scalable cell-free massive MIMO ISAC systems. These architectures address integrated communication and distributed environmental sensing, leveraging AP clustering and decentralized scanning for scalability ( $O(1)$ per-AP resource scaling, bounded fronthaul/cpu complexity) and high-rate, high-resolution multi-target detection (Elfiatoure et al., 2024, Buzzi et al., 2024). In NOA-based cell-free radio access, SenseCF labels RF-fingerprint based passive ISAC, enabling meter-level location estimation without communication performance degradation (Yu et al., 2023).

6. Limitations and Future Research Directions

Limitations of SenseCF in its LLM-driven counterfactual framework include:

Dependence on structured features, excluding raw time-series or multimodal inputs.
Possible out-of-distribution risks when fine-tuning data are limited.
The necessity of downstream clinical validation for intervention realism (Soumma et al., 21 Jan 2026).

Ongoing and future research aims to:

Integrate causal knowledge constraints to further restrict the space of plausible counterfactuals.
Extend SenseCF to multimodal LLM architectures for imaging and natural language sensor data.
Enable continuous, “LLM-in-the-loop” classifier retraining with ongoing CF augmentation.
Expand to network-level joint sensing-communication optimization and machine-learning-driven resource management in CF-ISAC systems (Galappaththige et al., 27 Feb 2025, Soumma et al., 21 Jan 2026).

In summary, SenseCF unifies high-fidelity, minimally invasive counterfactual explanation with scalable data augmentation, and demonstrates state-of-the-art effectiveness in clinical and sensor-based digital health, while offering a model-agnostic and interpretable pathway to robust, data-efficient predictive systems (Soumma et al., 21 Jan 2026, Soumma et al., 7 Jul 2025).