Selective Rewriting Approach
- Selective rewriting is a method that focuses on transforming specific fragments—such as extracted sentences or flawed queries—to preserve semantic integrity while enhancing performance.
- The approach employs techniques like group tagging, rule matching, and learned classifiers to ensure transformations are accurate and contextually relevant.
- Empirical results across domains show improvements in metrics like ROUGE scores, execution accuracy, and processing speed, demonstrating its practical impact.
Selective rewriting refers to a suite of methodologies that identify and transform only specific portions, fragments, or queries within a larger computational, linguistic, or logical context, rather than rewriting entire objects. By targeting localized regions—such as extracted summary sentences, flawed queries, code idioms, or subgraphs—these approaches achieve finer control over semantic fidelity, efficiency, and optimization objectives. This principle underpins recent advances across text summarization (Bao et al., 2021, Bao et al., 2022), dialog systems (Tanjim et al., 26 Feb 2025), logic synthesis (Ni et al., 2023), answer set programming (Dingess et al., 2020, Mastria et al., 2020), query optimization (Dharwada et al., 18 Feb 2025, Wang et al., 24 Jun 2025), compiler transformations (Couto et al., 2022), and large reasoning model (LRM) training (Yao et al., 20 Nov 2025), among others.
1. Foundational Concepts of Selective Rewriting
Selective rewriting initiatives formalize the transformation process via explicit selection mechanisms that govern which source fragments are eligible and how they are aligned with target rewrites. In text summarization, extracted sentences function as anchors, with models learning fine-grained group alignments from source to rewritten output (Bao et al., 2021, Bao et al., 2022). Dialog systems operationalize history selection by constraining rewrite prompts to the most relevant previous exchanges, often parametrized by a context window (Tanjim et al., 26 Feb 2025). In logic synthesis, reconvergence-driven rewriting applies only to selected cones in an AIG, optimized via learning-driven strategy selection (Ni et al., 2023). Compiler optimization and code rewriting detect idioms using automaton-encoded control and data dependency graphs, restricting transformation to matched fragments (Couto et al., 2022). On the reasoning side, selective self-rewriting operates only on samples satisfying correctness consistency, thereby mitigating reward instability in RL training (Yao et al., 20 Nov 2025).
Such selectivity is mathematically encoded via group-tag sequences, explicit classifier outputs, automaton acceptance states, rule-matching patterns, or correctness predicates, depending on the domain.
2. Formalization and Algorithmic Mechanisms
The formal structure of selective rewriting varies by field but is unified in assigning an eligibility predicate for transformation. Exemplary frameworks include:
- Contextualized Rewriting (text summarization): The conditional probability of a rewritten summary and its group alignments given the full document is maximized:
Group tags are assigned to both source and target tokens, which are injected via learnable embeddings, enabling cross-attention to target only extracted sentences during decoding (Bao et al., 2021, Bao et al., 2022).
- Dialog Query Rewriting: For a dialogue history of inputs and outputs , the rewrite function operates over the last turns:
No internal weighting or scoring is required; selection is purely positional (Tanjim et al., 26 Feb 2025).
- Reconvergence-Driven Logic Synthesis: Nodes are processed in topological order, and for each reconvergence cone, a learned classifier chooses among rewriting strategies (iSOP, Exact, NPN), adapting locally rather than globally (Ni et al., 2023).
- Compilers and Source Matching: A two-phase automaton-based DAG matcher first filters candidates by control-flow similarity (CDG), then applies data dependency checks (DDG), finally rewriting only matched regions as higher-level calls or idiom replacements (Couto et al., 2022).
- ML-Guided ASP Rule Decomposition: Structural feature vectors drive an MLP that outputs one of three labels (“decomp”, “do-not-decomp”, “indifferent”). Only rules flagged “decomp” are tree-decomposed; the rest are left untouched (Mastria et al., 2020).
3. Selectivity in Data-to-Text, Query, and Logic Applications
Numerous systems operationalize selectivity for task-specific gains:
- Text Summarization: Models ingest the full document but rewrite only sentences flagged by an extractive phase, leveraging group-tag alignment for cross-attention and redundancy control, without requirements for reinforcement learning (Bao et al., 2021, Bao et al., 2022).
- Natural Language to SQL (REWRITER): A “Checker” identifies NL inputs likely to lead to incorrect SQL by deterministic rules; only flagged queries pass to a “Reflector” and “Rewriter,” which correct the NL before final SQL generation (Ma et al., 22 Dec 2024). Ablation demonstrates that rewriting all queries without selection leads to decreased accuracy.
- Query Optimization (LITHE, SAGE): In SQL rewriting and neural IR query rewrites, selectivity is achieved by applying rule-based or strategy-guided transformations only when optimizer statistics indicate redundancy or performance bottlenecks. Reward shaping and explicit strategy selection dramatically reduce response latency and improve metric scores without over-modification (Dharwada et al., 18 Feb 2025, Wang et al., 24 Jun 2025).
- Compiler Source Rewriting (SMR): Only subgraphs matching user-supplied idiom patterns—detected through automaton-matched CDG/DDG traces—are rewritten, limiting transformation scope and preventing spurious modifications (Couto et al., 2022).
- Logic Synthesis and ASP: In reconvergence-driven AIG synthesis, local features and learned classifiers ensure rewriting is localized; in ML-guided ASP, data-driven heuristics avoid decomposing rules whose structure does not benefit grounding/performance (Ni et al., 2023, Mastria et al., 2020).
4. Selective Rewriting in Reinforcement Learning and Reasoning Training
Recent frameworks in LRM training apply selective rewriting in the RL loop as a form of internal supervision:
- Self-Rewriting with Selective Sample Choice: Only those samples whose first batch of reasoning generations attain unanimous correctness are selected for rewriting and further RL updates; the remainder follow vanilla PPO-style generation. This policy preserves reward signal stability and achieves improved accuracy–length trade-off (+0.6 accuracy, –46% length) and reduced internal reasoning pathologies (over-thinking, redundancy, disordered thinking) compared to random or full rewriting (Yao et al., 20 Nov 2025).
- Strategy-Guided Credit Assignment: In query rewriting for information retrieval, agents choose a human-defined strategy per rollout, receive batchwise credit assignment using Strategic Credit Shaping or Contrastive Reward Shaping, and are penalized for copying the original, thereby focusing rewriting on only effective query modifications (Wang et al., 24 Jun 2025).
5. Practical Impact and Experimental Validation
Empirical results confirm the strong performance benefits of selective rewriting across domains:
| System/Domain | Selectivity Mechanism | Main Performance Impact |
|---|---|---|
| Contextualized Rewriter (Bao et al., 2021, Bao et al., 2022) | Group-tag addressing; extracted sentences only | +1–3 ROUGE vs. non-context; better redundancy control |
| REWRITER NL→SQL (Ma et al., 22 Dec 2024) | Deterministic checker; flawed NL only | +1.6–2.0% execution accuracy vs. unfiltered rewriting |
| ML-guided ASP (Mastria et al., 2020) | Six-feature MLP, per-rule decision | Macro-F1 = 0.88; no end-to-end performance degradation |
| SAGE IR Query Rewrite (Wang et al., 24 Jun 2025) | Strategy selection; credit shaping | NDCG@10 +4.8 HotpotQA, -80% tokens, higher modification rate |
| Self-rewriting RL (Yao et al., 20 Nov 2025) | Correctness-based sample filter | +0.6 accuracy; Judge score +7.2 vs. prior RL baselines |
| SMR MLIR Compiler (Couto et al., 2022) | CDG/DDG automaton matching | CBLAS idioms matched, 5–295× execution speed-ups |
Ablation and comparative studies demonstrate that indiscriminate rewriting (“rewrite all”) often degrades accuracy or efficiency, substantiating the need for fine-grained, eligibility-aware methods.
6. Theoretical Properties, Guarantees, and Limitations
Selective rewriting methodologies frequently provide formal guarantees of semantic preservation. In text summarization, group-tag guided rewriting achieves segment-wise faithfulness and redundancy management while enabling statistical improvements without auxiliary reward signals (Bao et al., 2021, Bao et al., 2022). In logic synthesis, learned rewriting strategies (from Q-learning labels) guarantee PPA optimization within theoretical bounds (Ni et al., 2023). Verified clause processors (RP-Rewriter) offer semantic and syntactic invariants, controlled backchaining, and termination for large terms and proofs (Temel, 2020). Counting-aggregate rewritings in ASP maintain strong equivalence under splitting and projection transformations (Dingess et al., 2020). Compiler idiom matchers deliver structural preservation via automaton-based verification (Couto et al., 2022).
Known limitations include static feature reliance (ML-guided ASP), dependence on domain-specific pattern encodings (compiler rewriting), and class imbalance/coverage trade-offs. Nevertheless, extensible frameworks and modular classifier integration enable ongoing improvement.
7. Connections to the Rewriting Logic and Strategy Formalisms
The rewriting community formalizes “selectivity” via strategy objects—proof terms (Rewriting Logic), extensional subsets of derivations (Abstract Reduction Systems), or intensional partial functions on traces (Kirchner, 2013). These frameworks naturalize local vs. global control, enabling users to express rewrite steps conditioned on history, term structure, or external metrics. Modern selective rewriting systems instantiate these principles in practical algorithms, bridging theoretical expressivity and applied optimization (Kirchner, 2013).
In summary, selective rewriting is a foundational paradigm shared across computational linguistics, logic synthesis, knowledge representation, database optimization, compiler transformations, and reinforcement learning. By targeting only eligible source fragments and leveraging algorithmic, statistical, or logic-driven mechanisms, these approaches maximize conciseness, fidelity, and performance while maintaining tight semantic guarantees and minimizing unintended transformations.