Papers
Topics
Authors
Recent
Search
2000 character limit reached

Aspect-based Summarization (ABS)

Updated 23 January 2026
  • Aspect-based summarization (ABS) is a method that conditionally generates focused summaries by extracting content relevant to user-specified or discovered aspects.
  • ABS employs diverse methodologies—including two-stage pipelines, end-to-end conditional models, and retrieval-augmented techniques—to improve scalability, precision, and adaptability.
  • ABS enhances personalized content analysis across various domains such as product reviews, healthcare, legal, and scientific texts, ensuring relevance and traceability.

Aspect-based summarization (ABS) refers to the family of techniques and benchmarks for generating summaries that are explicitly conditioned on user-specified or content-derived aspects, rather than providing purely generic overviews. The formalisms, system architectures, and evaluation paradigms for ABS have evolved to accommodate domain- and open-vocabulary aspects, scalability, and faithfulness, making ABS a foundation for personalized and information-need-driven summarization in both research and production contexts.

1. Formal Definition and Task Variants

Aspect-based summarization is formally defined as conditional generation: given a document (or multi-document set) DD and an aspect aa (typically a word, phrase, or short string, but possibly a free-form user query), produce summary SaS^a that only includes information from DD relevant to aa (Hayashi et al., 2020, Yang et al., 2022, Amar et al., 2023, Santosh et al., 2024). The canonical parametric form is

Pθ(SaD,a)=t=1SaPθ(yty<t,D,a)P_\theta(S^a \mid D, a) = \prod_{t=1}^{|S^a|} P_\theta(y_t \mid y_{<t}, D, a)

where θ\theta denotes model parameters.

ABS is situated between generic abstractive summarization and query-focused summarization; it encompasses fixed aspect scenarios (e.g., pre-selected topical slots), open/as-you-go aspect specification, and dynamic settings where aspects must be discovered from content (Hayashi et al., 2020, Guo et al., 2024, Guo et al., 2024, Guo et al., 2024).

Variants include:

2. Datasets and Annotation Protocols

ABS research has benefitted from a surge of datasets, spanning both domain-specific and open-domain settings.

  • WikiAsp (Hayashi et al., 2020): Multi-domain (20 domains) dataset derived from Wikipedia, using section titles as aspects and section text as summaries. Features 399,696 instances.
  • OASum (Yang et al., 2022): Over 3.7M aspect-summary pairs mined from 2M Wikipedia pages, with 1.05M unique aspects, enabling open-domain ABS and pretraining.
  • OpenAsp (Amar et al., 2023): Multi-document, open-aspect dataset built via crowdsourcing from existing summarization corpora; supports ad hoc aspect queries over large clusters.
  • LexAbSumm (Santosh et al., 2024): Legal domain (European Court of Human Rights), with manually aligned (aspect, judgment, aspect-summary) triplets.
  • ACLSum (Takeshita et al., 2024): Scholarly domain; 250 scientific papers annotated for three classical aspects (Challenge, Approach, Outcome).
  • TracSum (Chu et al., 19 Aug 2025): Medical abstracts, annotated for seven clinical aspects, with fine-grained sentence-level traceability between summary and source.
  • AmaSum, CHA-Summ, SumIPCC, BookAsSumQA: Domains include product reviews (Zhou et al., 11 Jun 2025), consumer health Q&A (Chaturvedi et al., 2024), climate change (SumIPCC) (Ghinassi et al., 2024), and long-form books with QA-based evaluation (Miyazato et al., 9 Nov 2025).

Open-domain ABS datasets often use cost-efficient protocols to extract aspect-summaries from existing gold data, such as mapping generic summary sentences to aspects (Amar et al., 2023, Yang et al., 2022).

3. System Architectures and Methodological Advances

ABS architectures can be grouped into several major paradigms:

Prompt-based pipelines, especially with LLMs (e.g., Gemini 1.5 Flash (Boytsov et al., 30 Sep 2025)) and few-shot examples, are prevalent in production settings for structured data types (reviews, CQA) (Boytsov et al., 30 Sep 2025, Chaturvedi et al., 2024).

4. Evaluation Protocols and Key Metrics

ABS evaluation spans standard summarization metrics and novel aspect-specific criteria:

5. Domain-Specific Applications and Scaling

ABS is established as a key enabler of user-aligned summarization in domains with high information complexity:

  • Product Reviews: Identify and summarize top aspect-sentiment pairs (quality, assembly, value, etc.), often at production scales (Wayfair pipeline: 11.8M reviews, 19K canonical aspects) (Boytsov et al., 30 Sep 2025, Zhou et al., 11 Jun 2025).
  • Health & Medicine: Multi-aspect summarization in medical Q&A and clinical abstracts, incorporating explicit mapping to semantic frames (suggestion, experience, information, questions), and traceable sentence-level citations to facilitate verification by practitioners (Chaturvedi et al., 2024, Chu et al., 19 Aug 2025).
  • Scientific and Legal Documents: Disentangle discourse roles (problem, method, result, legal reasoning); support granular information retrieval for researchers and professionals under length and abstraction constraints (Takeshita et al., 2024, Santosh et al., 2024).
  • Climate and Long-form Narratives: Segmenting climate reports or books into high-level aspects (e.g. genre, policy topic), with RAG augmentations for scale (Ghinassi et al., 2024, Miyazato et al., 9 Nov 2025).

ABS at scale requires robust aspect de-duplication, ontological canonicalization, and sampling strategies to maintain both interpretability and relevance in the presence of aspect diversity and lexical variation (Boytsov et al., 30 Sep 2025, Zhou et al., 11 Jun 2025).

6. Open Problems and Research Directions

Despite major advances, several challenges persist in ABS research:

  • Aspect Discovery: Accurately inducing the correct number and type of aspects dynamically, especially in disordered or open-domain texts (Guo et al., 2024, Guo et al., 2024, Guo et al., 2024).
  • Aspect Sensitivity: Ensuring output distinctness for different aspect prompts, with evidence that current models often collapse to generic, aspect-agnostic generation on long inputs (Santosh et al., 2024, Amar et al., 2023).
  • Faithfulness and Traceability: Reducing hallucinations and providing explicit evidence support, especially in high-stakes domains like law and medicine (Chu et al., 19 Aug 2025, Santosh et al., 2024).
  • Open-vocabulary Generalization: Handling arbitrary or unseen aspect queries with limited or no supervision, including zero-shot adaptation (Yang et al., 2022, Amar et al., 2023).
  • Scalability and Eco-efficiency: Balancing summary quality with architectural efficiency and environmental cost, particularly as LLMs are increasingly deployed (Ghinassi et al., 2024).
  • Evaluation Gaps: Shortcomings of overlap-based metrics and need for aspect-sensitive or answer-supporting metrics such as QA-based frameworks (Miyazato et al., 9 Nov 2025).

Current trends emphasize (a) multi-objective and retrieval-augmented architectures (Feng et al., 17 Apr 2025, Guo et al., 2024), (b) joint discovery-generation (JADS (Guo et al., 2024)), (c) leveraging weak supervision and external knowledge bases (Tan et al., 2020), and (d) large-scale resource creation to facilitate open-domain and low-resource ABS.


Key References:

Aspect-based summarization remains a core challenge at the intersection of targeted information access, scalable NLP, and user-centric document understanding. Progress in methodologies, benchmarks, and data-efficient learning continues to redefine its scope and impact.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (18)

Topic to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Aspect-based Summarization (ABS).