Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
162 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

LLM-based Text Simplification and its Effect on User Comprehension and Cognitive Load (2505.01980v1)

Published 4 May 2025 in cs.CL

Abstract: Information on the web, such as scientific publications and Wikipedia, often surpasses users' reading level. To help address this, we used a self-refinement approach to develop a LLM capability for minimally lossy text simplification. To validate our approach, we conducted a randomized study involving 4563 participants and 31 texts spanning 6 broad subject areas: PubMed (biomedical scientific articles), biology, law, finance, literature/philosophy, and aerospace/computer science. Participants were randomized to viewing original or simplified texts in a subject area, and answered multiple-choice questions (MCQs) that tested their comprehension of the text. The participants were also asked to provide qualitative feedback such as task difficulty. Our results indicate that participants who read the simplified text answered more MCQs correctly than their counterparts who read the original text (3.9% absolute increase, p<0.05). This gain was most striking with PubMed (14.6%), while more moderate gains were observed for finance (5.5%), aerospace/computer science (3.8%) domains, and legal (3.5%). Notably, the results were robust to whether participants could refer back to the text while answering MCQs. The absolute accuracy decreased by up to ~9% for both original and simplified setups where participants could not refer back to the text, but the ~4% overall improvement persisted. Finally, participants' self-reported perceived ease based on a simplified NASA Task Load Index was greater for those who read the simplified text (absolute change on a 5-point scale 0.33, p<0.05). This randomized study, involving an order of magnitude more participants than prior works, demonstrates the potential of LLMs to make complex information easier to understand. Our work aims to enable a broader audience to better learn and make use of expert knowledge available on the web, improving information accessibility.

Summary

  • The paper introduces an automated, minimally lossy text simplification system using Gemini LLMs, achieving a 3.9% overall MCQ accuracy boost and up to 14.6% in technical domains.
  • The methodology employs a multi-component system—including readability and fidelity models with ranking and prompt refinement—for precise evaluation and iterative improvement.
  • The results demonstrate reduced cognitive load and increased user confidence, highlighting the system’s potential to enhance accessibility of complex online information.

This paper explores the use of LLMs for text simplification and evaluates its impact on user comprehension and cognitive load. The core problem addressed is that much of the information available online, particularly expert knowledge in domains like science, law, and finance, is written at a reading level that is inaccessible to a large portion of the population. This hinders effective information dissemination and informed decision-making.

The authors developed a system for "minimally lossy" text simplification using several Gemini LLMs. The system architecture consists of four main components:

  1. Text Simplification Model: This model, based on Gemini 1.5 Flash, takes original text and generates a simplified version. Its performance is driven by a custom prompt and few-shot examples.
  2. Automated Evaluation (Autoeval) System: This system uses two separate models to evaluate the output of the simplification model:
    • Readability Model: A Gemini Ultra model that assigns a readability score (1-10) to the simplified text. It was iteratively developed and improved by comparing its scores to human readability evaluations.
    • Fidelity Model: A Gemini 1.5 Pro model that assesses whether the simplified text accurately preserves the information from the original (completeness) and logically follows from it (entailment). It uses a step-by-step reasoning prompt to break down the original text into atomic claims, map them to the simplified text, and identify specific error types (information loss, information gain, distortion) with empirically determined weights.
  3. Ranking Module: This module scores candidate prompts for the simplification model based on the averaged readability score minus the averaged fidelity error score from the autoeval system.
  4. Prompt Refinement Model: A Gemini 1.5 Pro model that iteratively refines the prompt and few-shot examples for the text simplification model based on the scores from the ranking module. This automated feedback loop helps improve the simplification quality over time.

This automated system for prompt refinement is a practical implementation pattern for developing and improving LLM-based generation capabilities without extensive manual labeling.

To validate the effectiveness of the simplified text, the authors conducted a large-scale randomized controlled comprehension paper involving 4,563 participants and 31 texts from 6 diverse subject areas (PubMed, biology, law, finance, literature/philosophy, aerospace/computer science). Participants were randomized to read either the original or the simplified text (or both) in either an "open-book" setting (text visible while answering questions) or a "closed-book" setting (text not visible). Comprehension was measured using multiple-choice questions (MCQs), and participants also provided self-reported confidence and perceived ease (cognitive load) ratings.

The paper's results demonstrated significant practical benefits:

  • Improved Comprehension: Participants who read the simplified text achieved a statistically significant 3.9% absolute increase in overall MCQ accuracy compared to those who read the original text (48.2% vs. 44.3%, p<0.05).
  • Domain Specific Gains: The gains were most pronounced for highly technical domains like PubMed (14.6% absolute increase), indicating the approach is particularly effective for complex scientific text. Moderate gains were also observed in finance (5.5%), aerospace/computer science (3.8%), and legal (3.5%).
  • Robustness: The comprehension improvements were robust even in the "closed-book" setting, suggesting that simplification helps with both immediate understanding and short-term retention.
  • Reduced Cognitive Load: Participants reported significantly higher perceived ease (lower cognitive load) when reading simplified texts (0.33 absolute increase on a 5-point scale, p<0.05) and increased confidence (0.24 absolute increase on a [-2, 2] scale, p<0.05).

The paper highlights that simplified text led to greater accuracy improvements for questions where the original text was harder to understand (those with lower original accuracy, as shown in Figure 6). Table 3 provides concrete examples from the PubMed domain where accuracy improvements were particularly large (e.g., a 38% absolute increase for a question about Combined Pulmonary Fibrosis and Emphysema pathophysiology).

For practical implementation, the findings suggest that LLM-based simplification can be a valuable tool for improving information accessibility on the web. Potential deployment strategies include:

  • User-Triggered Simplification: Users could highlight text they find difficult and request a simplified version on demand. This requires a fast simplification model, which is why using a model like Gemini 1.5 Flash is a practical consideration.
  • Automated Background Simplification: A system could potentially identify challenging sections of text and pre-generate simplified versions, perhaps personalized to the user's likely reading level, to minimize perceived latency. This approach requires more computation but offers a smoother user experience.

The paper emphasizes that the automated prompt refinement system is generalizable and could be applied to developing other LLM capabilities. The trade-off in deployment lies between the computational cost of pre-processing (automated) and the responsiveness required for on-demand simplification (user-triggered).

Limitations noted include the paper population (general survey takers potentially less motivated than real-world users seeking information), the potential for simplification errors to introduce inaccuracies (though the goal is minimally lossy), and the nature of MCQs potentially not capturing the full depth of understanding. Future work could explore these aspects, including evaluating impact on motivated users and using alternative assessment methods. Retrieval augmentation could also be incorporated to handle novel or rapidly changing information.

In summary, this research provides a validated method for building and evaluating LLM-based text simplification systems, demonstrating significant practical gains in comprehension and perceived ease, particularly for technical content, paving the way for improved information accessibility online.