Ledom: Reverse Language Model

Updated 6 July 2025

Ledom is a reverse language model that processes text in reverse order to predict previous tokens based on future context.
It employs innovations like Multi-Query Attention, RoPE, RMSNorm, and SwiGLU activations and is available in 2B and 7B parameter variants.
Its Reverse Reward mechanism refines forward model outputs to enhance multi-step reasoning and ensure logical consistency.

Ledom is a reverse LLM (RLM) that fundamentally departs from conventional forward autoregressive LLMs (FLMs) by processing and generating text in reverse temporal order. Trained on 435 billion tokens using both 2B and 7B parameter variants, Ledom is designed to predict previous tokens based on their future context, establishing a new modeling paradigm in natural language processing. It introduces methodological innovations in training, inference, and hybrid reward mechanisms, and its open release of models, training code, and pre-training data positions it as a foundational resource for both research and applied domains.

1. Reverse LLMing: Training Objective and Methodology

Ledom operates as the first "purely reverse" LLM, differing fundamentally from FLMs in both data ordering and conditional dependencies. During training, input sequences $x = (x_1, x_2, \dots, x_T)$ are tokenized with standard FLM tokenizers and then reversed to form $(x_T, \dots, x_1)$ . The autoregressive learning objective is to predict each token given the tokens that follow it in the original sequence:

$P_{RLM}(x) = \prod_{t=1}^T P(x_t \mid x_{t+1}, x_{t+2}, \dots, x_T; \theta_{RLM})$

By contrast, a conventional FLM operates as:

$P_{FLM}(x) = \prod_{t=1}^T P(x_t \mid x_1, x_2, \dots, x_{t-1}; \theta_{FLM})$

As a consequence, Ledom's hidden state at token $t$ ( $h_t$ ) incorporates future (in original order) context. All model parameters are trained via backpropagation with gradients that flow "backward in time," aligning with the reversed data presentation. The base architecture remains a Transformer decoder, identical in structural design to contemporary FLMs but incorporating enhancements such as Multi-Query Attention, Rotary Positional Embeddings (RoPE), RMSNorm, and SwiGLU activations. The key inductive bias distinguishing Ledom lies in the sequence reversal during both data preparation and modeling.

2. Parameter Variants and Architectural Differences

Ledom is instantiated in two parameter scales, each designed for different research and deployment scenarios. The specifications are detailed in the original work's Table 1:

Model Size	Layers	Attention Heads	Model Dim.	FFN Dimension
Ledom-2B	18	8	2048	16,384
Ledom-7B	28	16	3072	24,576

The 2B variant is well-suited to resource-constrained use cases, while the 7B model demonstrates enhanced performance, particularly on complex reasoning tasks. The trade-off between model capacity and output structure is noted: the larger model offers improved prediction capability, but may generate outputs that align less well with typical forward-language task constraints. These variants provide an empirical framework for investigating the effects of model scale and reverse processing on performance.

3. Reverse Reward: Bidirectional Reranking and Posterior Evaluation

A novel practical application enabled by Ledom is the Reverse Reward mechanism, which utilizes Ledom's backward reasoning capacity to refine the outputs of forward models. Given a prompt $x$ and FLM-generated candidate $y$ , Ledom computes a reverse likelihood of $x$ conditioned on $y$ :

$R_{RLM}(x, y) = \prod_{t=1}^T P_{RLM}(x_t \mid x_{t+1:T}, y; \theta_{Ledom})$

This reverse score is combined with the candidate's forward probability via a bidirectional reward formula:

$R(x, y) = [P_{FLM}(y \mid x; \theta_{FLM})]^{1 - \lambda} \cdot [R_{RLM}(x, y)]^{\lambda}$

where $\lambda \in [0,1]$ controls the contribution of each model. When applied to mathematical reasoning benchmarks (e.g., GSM8K), reranking candidate solutions or intermediate steps with Reverse Reward yields substantially improved answer quality and more coherent stepwise inference. This workflow utilizes Ledom's posterior evaluation to support selection and beam-search strategies, particularly in domains where logical consistency over sequential reasoning steps is required.

4. Distinctive Capabilities and Task-Specific Behaviors

Ledom's reverse generation methodology results in distinct behavioral and inferential properties:

Abductive Reasoning: Leveraging future context to reconstruct plausible antecedents, Ledom exhibits strong abductive capabilities.
Backward Story Generation: Given a narrative endpoint, Ledom can generate preceding events, facilitating applications in story reconstruction, question synthesis, and augmented data generation.
Enhanced Multi-step Mathematical Reasoning: By reasoning from solution to problem, Ledom effectively hypothesizes possible intermediate steps, improving performance on tasks structured as outcome inference.
Limitations: For tasks necessitating strict incremental forward progression (such as code generation), reverse modeling may introduce structural constraints or reduced fluency. This complementary behavior highlights the potential for hybrid FLM-RLM systems that balance the strengths of both paradigms.

5. Resource Release and Facilitation of Future Research

All model checkpoints (2B and 7B), training code, and pre-training data—comprising 435 billion tokens spanning general text, mathematics, and source code domains—are slated for open release. The intent is to provide a robust experimental baseline for further investigation into reverse autoregressive modeling. Researchers are thus equipped to:

Explore alternative training schemes and architectural modifications unique to reverse temporal modeling.
Analyze model inductive biases attributable to sequence reversal.
Develop hybrid architectures that jointly leverage forward and reverse inference for increased robustness and reasoning symmetry.
Investigate safety, simulation, and inference applications made feasible by the availability of backward reasoning models.

This release is positioned to accelerate research in LLM dynamics, posterior evaluation, and automated reasoning, as well as to initiate rigorous scrutiny of the properties conferred by the reverse sequence paradigm.

6. Implications for LLMing and Applications

By challenging the foundational left-to-right autoregressive assumption, Ledom introduces a new axis of modeling diversity. It demonstrates that backward token dependency can realize unique behavioral capacities, enhance downstream task performance when combined in bidirectional frameworks, and serve as a stimulus for a broad range of future methodological developments. The availability of both models and training data amplifies its potential influence, providing a platform for examining theoretical conjectures and for applied innovation in tasks requiring backward logical inference, abductive generation, and bidirectional evaluation.

PDF Markdown Chat (Upgrade)

Follow-up Questions

We haven't generated follow-up questions for this topic yet.

Generate Now