CAC-CoT: Efficient Compact Chain-of-Thought
- The paper presents CAC-CoT, a method that uses a fixed set of connector phrases to generate compact, coherent reasoning traces for LLMs.
- CAC-CoT achieves state-of-the-art accuracy-efficiency trade-offs, reducing average reasoning tokens by up to 75% compared to traditional methods.
- The study details algorithmic constraints and a synthetic data generation protocol that ensure quality, cognitively inspired reasoning aligned with dual-system theory.
Connector-Aware Compact Chain-of-Thought (CAC-CoT) is a method for generating and fine-tuning LLMs using concise, connector-regulated reasoning traces. Motivated by the observation that long chain-of-thought (CoT) prompting can degrade efficiency and even accuracy on fast, intuitive "System-1" tasks, CAC-CoT enforces brevity by restricting reasoning to a small, fixed set of connector phrases. This deliberate constraint yields highly efficient and coherent reasoning suitable for both System-1 (fast, heuristic) and System-2 (analytical, deliberative) cognitive tasks, aligning with dual-process theory. CAC-CoT demonstrates state-of-the-art accuracy-efficiency trade-offs in empirical benchmarks, while producing reasoning traces that are substantially shorter than typical CoT outputs (Choi et al., 26 Aug 2025).
1. Connector Phrase Formalism
Connector-Aware Compact CoT introduces two disjoint, finite sets of short connector phrases:
- : "incorrect connectors" (e.g., "However, this might not be the right path because …", "Hmm, that might be a dead end.")
- : "correct connectors" (e.g., "Now that’s convincing, it really does.", "Everything fits together nicely.")
Each connector is a contiguous token sequence satisfying: (a) semantic signaling of either uncertainty/re-evaluation () or confirmation/advancement (), (b) syntactic validity for insertion between reasoning steps.
Generation strictly samples connectors from these lists, each of cardinality 20 in the described implementation. This connector selection ensures that reasoning traces are compact, well-structured, and punctuated by cognitively meaningful checkpoints.
2. Algorithmic Constraints and Mathematical Formulation
A generated reasoning trace with tokens adheres to several hard constraints:
- : total connectors.
- : re-validation triggers from incorrect connectors.
The constraints are:
- Length: (0 tokens).
- Connector bound: 1 (2; i.e., 3 connectors per maximum-length trace).
- No consecutive connectors: for all 4, if 5 then 6.
- Validation: 7.
- Early termination: if answer repetition 8 or character length 9, output “Reasoning failed…” and abort.
Connector density is thus constrained to 0 per 1 tokens.
3. Synthetic Data Generation Protocol
Synthetic corpora for CAC-CoT are generated using the Gemini-2.0-Flash LLM following the paraphrased Algorithm 1. Pseudocode structure:
4
Function ConstraintsSatisfied enforces 2 and format integrity.
4. Comparative Analysis: Trace Length, Accuracy, Efficiency
CAC-CoT training traces average 3 tokens, about one-fifth the size of s1-1.1 (4 tokens), and have the lowest connector density (2.65 per 1,000 tokens). The following table summarizes corpus statistics:
| Dataset | Len_avg | Connectors/1K | Samples |
|---|---|---|---|
| s1-1.1 | 9,291.6 | 5.55 | 1,000 |
| LIMO | 6,984.1 | 2.97 | 800 |
| Bespoke | 4,452.2 | 5.13 | 16,700 |
| CAC-CoT | 1,843.4 | 2.65 | 1,391 |
On System-1 benchmarks (S1-Bench), CAC-CoT-7B achieves the highest accuracy@5 (ACC@5) at 5 with the shortest average reasoning tokens (ART 6), representing a 7 reduction in trace length versus s1.1. System-2 benchmarks reveal minimal loss in mathematical task accuracy, with only 8 points lost on GSM8K and 9 point on GPQA, despite one-third the ART of baselines.
| Model | Pass@1 | ACC@5 | ART |
|---|---|---|---|
| s1.1-7B | 99.25% | 68.03% | 1,138 |
| LIMO-7B | 87.30% | 49.05% | 1,140 |
| Bespoke-7B | 95.79% | 76.57% | 547 |
| CAC-CoT-7B | 98.79% | 86.07% | 286 |
| Benchmark | GSM8K | GPQA | AMC23 | AIME24 | Math500 | AVG |
|---|---|---|---|---|---|---|
| s1.1-7B | 90.67% | 39.39% | 55.00% | 13.33% | 79.40% | 55.55% |
| CAC-CoT | 85.37% | 38.38% | 50.00% | 10.00% | 68.00% | 50.35% |
Scatter plots confirm CAC-CoT outputs cluster at lower trace lengths and fewer connectors. Lower connector redundancy correlates with less repetition and no decrease in System-1 accuracy.
5. Experimental Methodology and Benchmarks
CAC-CoT training leverages the Qwen-2.5-7B-Instruct model, fine-tuned over 5 epochs using AdamW optimizer (0, 1), learning rate 2, cosine scheduler, batch size 1, gradient accumulation 4, block size 4,000, and weight decay 3. Experiments use 4 NVIDIA A100-80GB GPUs.
Evaluation employs both System-1 and System-2 benchmarks:
- S1-Bench: Analysis, instruction, knowledge, and reasoning subtasks.
- System-2: AMC23, AIME24, GSM8K, GPQA Diamond, Math500.
Metrics include Pass@1, Accuracy@5, Success (all steps correct), and ART (average reasoning tokens). Results highlight that lower connector density provides a superior accuracy-efficiency trade-off compared to previous baselines.
6. Limitations, Potential Extensions, and Cognitive Implications
Notable limitations:
- The method uses a single backbone (Qwen-2.5-7B); effects on models like LLaMA or OPT remain untested.
- Synthetic data is generated from a single LLM (Gemini-2.0-Flash), which may introduce stylistic biases.
- Manual curation of connector lists; absence of an automatic connector-selection policy.
Potential extensions include:
- Ensemble-based data generation to mitigate source bias.
- Learnable connector-injection policies via reinforcement learning.
- Dynamic or embedding-based connector sets.
- Application to domains beyond mathematics, such as commonsense or code reasoning.
From a cognitive perspective, CAC-CoT explicitly models compact, "System 1" inference punctuated by "System 2" reflective connectors. This design enables LLMs to achieve both rapid, heuristic task performance and strategic, explicit checkpointing, paralleling Kahneman's dual-process theory. A plausible implication is enhanced flexibility in LLM reasoning, balancing efficient intuition and periodic verification.
7. Significance and Research Context
CAC-CoT constitutes a prompt-based recipe for training models on concise, yet coherent, explicit reasoning traces. It addresses the verbosity and inefficiency of traditional CoT prompting without sacrificing task accuracy. By systematically constraining trace length and connector usage, it represents a data-efficient, cognitively motivated approach to LLM reasoning synthesis across dual-system tasks. This work aligns with a growing body of research on efficient, interpretable, and cognitively inspired LLM prompting and data generation (Choi et al., 26 Aug 2025).