Dynamic Hierarchical Outlining (DHO)

Updated 23 November 2025

Dynamic Hierarchical Outlining (DHO) is a paradigm that uses a multi-tiered planning approach to structure long-form texts, ensuring global coherence and adaptability.
The methodology involves generating an initial rough outline that is incrementally expanded into detailed sub-outlines, interleaving planning with generation.
DHO integrates memory-enhancement modules to retrieve and update contextual facts, reducing conflicts and maintaining consistency during text generation.

Dynamic Hierarchical Outlining (DHO) is a paradigm in computational text generation that structures the production of long-form documents by recursively planning high-level content and incrementally refining it into detailed passages. DHO has been developed to address limitations in existing LLMs regarding global coherence, macro-structural completeness, and the adaptability needed during the generative process—especially in contexts such as encyclopedic articles and narrative fiction. DHO formalizes outline-based hierarchical planning, integrates dynamic adaptation during generation, and, in recent work, incorporates memory modules to ensure contextual consistency in extended texts (Drissi et al., 2018, Wang et al., 2024).

1. Principles and Hierarchical Structure

DHO systems introduce an explicit division between macro-level planning and micro-level realization. The process maintains a multi-tiered outline, evolving from abstract sections to granular subgoals:

Rough Outline ( $R$ ): High-level stages reflecting narrative or topical structure. For narrative domains, these often correspond to canonical frameworks such as Joseph Campbell’s five-stage model (Exposition, Rising Action, Climax, Falling Action, Resolution). For expository writing, each outline node may parallel major article sections.
Detailed Outlines ( $D = \{d_i\}$ ): For each high-level stage $r_i \in R$ , a set of dynamically expanded sub-outlines ( $d_i$ ) guides local coherence within each stage. Detailing occurs in response to current context and, in recent implementations, the dynamically updated memory.

Unlike static hierarchical planners, DHO merges planning with generation, interleaving outline refinement during the generative process. This supports adaptability to unforeseen developments, ensuring both coverage of predefined abstract goals and responsiveness to local uncertainties (Wang et al., 2024).

2. Algorithmic Workflow and Model Architectures

DHO instantiates as a multi-stage pipeline, with clear architectural separation:

Outline Generation Module: Given an input prompt $p$ , the module $p_\theta(o|p)$ maps $p$ to an outline $o$ using a convolutional sequence-to-sequence encoder–decoder with gated self-attention. Gold outlines are typically extractive, selected from the source corpus using methods such as SumBasic with TF–IDF weighting, ensuring that topic sentences align with human summary judgments (Drissi et al., 2018).
Article/Story Generation Module: Conditioned on the outline $o$ , $q_\phi(x|o)$ synthesizes the full text $x$ via a decoder equipped with hierarchical attention. This decoder operates over both word- and sentence-level outline embeddings, blending fine-grained and coarse discourse cues.

Algorithmic workflow in advanced narrative settings with memory modules is illustrated below (cf. DOME framework (Wang et al., 2024)):

Algorithm 1: DHO with Memory-Enhancement

1. Initialize temporal knowledge graph (KG)
2. Produce rough outline R by prompting LLM with writing theory and input
3. For each stage r_i in R:
   a. Retrieve relevant memory RInfo_i = MEM.query(KG, r_i)
   b. Generate detailed outline d_i = LLM(r_i, RInfo_i)
   c. For each sub-outline do_i^t in d_i:
       i.  Retrieve DInfo_i^t = MEM.query(KG, do_i^t)
       ii. Generate text s_i^t = LLM(do_i^t, DInfo_i^t)
       iii. Extract new triples and insert into KG

This process ensures that at each level, content conditioning incorporates both the hierarchical plan and up-to-date world/model state as maintained in the temporal knowledge graph.

3. Mathematical Formulation and Loss Functions

In two-module DHO, the total loss is the sum of negative log-likelihoods for each independently trained module:

Outline loss:

$\mathcal{L}_{\text{outline}}(\theta) = -\sum_{(p,o)} \log p_\theta(o | p), \quad p_\theta(o|p) = \prod_{t=1}^N p_\theta(o_t | o_{<t}, p)$

Article loss:

$\mathcal{L}_{\text{article}}(\phi) = -\sum_{(o,x)} \log q_\phi(x | o), \quad q_\phi(x | o) = \prod_{t=1}^T q_\phi(x_t | x_{<t}, o)$

A combined training objective of $\mathcal{L}_{\text{outline}} + \mathcal{L}_{\text{article}}$ is standard (Drissi et al., 2018).

When integrated with memory modules, the system also scores and retrieves contextual facts using embedding similarities and LLM-based semantic scoring. Contextual consistency is automatically evaluated via a Temporal Conflict Analyzer (TCA), which computes the conflict rate over extracted fact quadruples:

$\mathrm{CR} = \frac{m}{N} \times 100\%$

where $N$ is the total number of knowledge quadruples and $m$ is the number exhibiting conflict (Wang et al., 2024).

4. Memory-Enhancement and Contextual Consistency

Recent enhancements embed a Memory-Enhancement Module (MEM) implemented as a temporal knowledge graph (TKG). Each memory is a quadruple $\langle\mathrm{subject}, \mathrm{action}, \mathrm{object}, \mathrm{chapter}\rangle$ :

Insertion: Segments generated by the model are processed with a triple-extraction LLM prompt to populate the TKG.
Retrieval: On a new outline or text-generation request, MEM.query retrieves relevant quadruples using cosine similarity in embedding space, then filters and scores candidates with an LLM on five binary relevance criteria (shared subject, object, action, event, and utility for writing). Results are provided as natural language to augment the LLM’s input context.

Contextual conflicts are diagnosed with TCA. The analyzer aggregates triples by structural relations, prompts an LLM to summarize each group, and judges for temporal/factual conflict according to explicit criteria, yielding a precise conflict rate as a global measure of narrative/semantic consistency (Wang et al., 2024).

5. Comparative Evaluation and Empirical Findings

The capacity of DHO models is evaluated through both automated and human metrics:

Automated metrics:
- Perplexity (PPL) for stand-alone modules and baselines; substantial PPL reduction is observed when conditioning generation on gold outlines ( $\sim$ 10 points, e.g., prompt→article: 31.0, outline→article: 21.1, outline→article + hierarchical attention: 20.5) (Drissi et al., 2018).
- N-gram entropy (Ent-2) to measure output diversity; DHO advances Ent-2 by 6.87% over SOTA (Wang et al., 2024).
- Conflict rate (CR) for contextual consistency; implementations with DHO and MEM reduce CR by $\approx 27\%$ and, in ablation, MEM cuts conflicts by 87.61% (Wang et al., 2024).
Human metrics:
- Evaluations include global coherence, quality, plot-completeness, plot-coherence, relevance, interest, and expression-coherence.
- Notably, improved perplexity does not always translate to better human judgments; DHO generally matches or slightly lags flat baselines in human-perceived coherence and quality (Drissi et al., 2018). In contrast, DHO-assisted systems with MEM win all human rank metrics with averages near 1.2 (Wang et al., 2024).

Table: Excerpted Human Judgments (Means, N=70)

Model	Global Coherence	Overall Quality
prompt→article	3.36 ± 1.00	2.91 ± 1.07
hierarchical prompt→article + h.a.	2.54 ± 0.96	2.26 ± 0.89
outline→article + h.a.	3.14 ± 1.31	2.90 ± 1.22

Statistical tests indicate the flat baseline often statistically outperforms DHO on perceived coherence and quality, emphasizing a complexity-vs-quality tradeoff (Drissi et al., 2018). In more advanced DHO+MEM systems, however, DHO contributes significant gains in measured consistency and plot coherence (Wang et al., 2024).

6. Characteristic Failure Modes, Limitations, and Extensions

Failure analyses identify both strengths and structural limitations:

Strengths: Precision of outlines enhances factual focus and adherence to plan when the outline is reliable. The hierarchical decoder with sentence-level attention yields measurable perplexity gains and slightly richer n-gram diversity.
Limitations: Reliance on gold outlines during training leads to brittleness when outlines are noisy at inference. Outlines that are imprecise cause cascade errors, including repetition, incoherence, and unrecoverable context drifts. The article generator is unable to recover from off-topic outlines because it is never exposed to poor outlines during training (Drissi et al., 2018).
Limitations in metric alignment: Discrepancies between perplexity improvements and human-perceived quality persist, indicating that sequence likelihood alone is insufficient to measure long-form coherence and utility.

Proposed extensions include:

Fine-tuning article generators on mixed-quality outlines for robustness.
Incorporation of copying mechanisms to enforce tighter coupling between outline and generation.
Use of abstractive/planner-based outline generation in place of extractive summarization.
Integration of end-to-end training via policy gradient or minimum-risk objectives to better align marginal likelihood with text utility.
Enhanced evaluation metrics, such as learned coherence scorers or adversarial discriminators, for more faithful reflection of human judgment (Drissi et al., 2018, Wang et al., 2024).

7. Illustrative Example of Dynamic Outline Adjustment

Consider a five-stage rough outline for a fictional narrative (following Campbell’s model). In the “Rising Action” stage, the memory contains facts about the protagonist’s budget issues and romantic tension. When expanding this stage, DHO retrieves extant facts, then instructs the LLM to provide a set of three finer-grained chapter outlines. Suppose novel content—a new side character—emerges while generating the first sub-chapter. The subsequent outline prompt, aware of this new entity via updated MEM, dynamically adapts to include the character, thus ensuring the narrative remains both consistent and resilient to generation-time surprises (Wang et al., 2024).

References

"Hierarchical Text Generation using an Outline" (Drissi et al., 2018)
"Generating Long-form Story Using Dynamic Hierarchical Outlining with Memory-Enhancement" (Wang et al., 2024)

PDF Markdown Chat (Pro)

References (2)

Hierarchical Text Generation using an Outline (2018)

Generating Long-form Story Using Dynamic Hierarchical Outlining with Memory-Enhancement (2024)

Whiteboard

Generate a whiteboard explanation of this topic.

Topic to Video (Beta)

Generate a video overview of this topic.

Follow Topic

Get notified by email when new papers are published related to Dynamic Hierarchical Outlining (DHO).