Aspect-Based Summarization

Updated 16 November 2025

Aspect-based summarization is a technique that targets specific aspects of texts to produce focused, user-aligned summaries across various domains.
State-of-the-art methods leverage encoder-decoder models, long-context architectures, and dynamic aspect inference to ensure accurate alignment and traceability.
Robust evaluation frameworks and diverse datasets validate ABS by measuring relevance, conciseness, and domain scalability in practical applications.

Aspect-based summarization (ABS) is the task of generating concise, targeted summaries that selectively condense information from a source text (or set of texts) with respect to specified aspects or subtopics. Unlike generic summarization, which yields a single overall digest, ABS conditions generation on explicit aspect cues—enabling multifaceted, user-aligned outputs relevant for domains ranging from reviews and meetings to scientific papers, legal decisions, health forums, climate reports, and dynamic settings with open or induced aspect inventories.

1. Formal Definitions and Problem Scope

Aspect-based summarization is typically defined as follows. Given a document $D$ and either a fixed set $A = \{a_1, \dots, a_m\}$ of aspect labels (closed ABS), or allowing any user-specified aspect $a$ (open ABS), the goal is to produce one summary $S_a$ per aspect, maximizing relevance to $a$ and conciseness. Mathematically, most models seek parameters $\theta$ to maximize

$S_a = \arg\max_{S'} P_\theta(S'|D,a)$

with cross-entropy loss over reference summaries. In multi-document or open settings, inputs generalize to $\{D_1,\ldots,D_N\}$ and arbitrary aspect phrase $a$ .

Dynamic and self-supervised variants further expand the scope: the number of aspects $K$ is not provided and must be inferred, and aspect discovery is either unsupervised (e.g., clustering, topic induction) or learned jointly with summarization (Guo et al., 2024, Guo et al., 2024, Guo et al., 2024).

2. Dataset Construction and Annotation Protocols

Multiple large-scale datasets underpin ABS research:

OASum (Yang et al., 2022): >3.7M $(D,a,S)$ triples from Wikipedia, aspect = section title, summary = head sentence(s) matched with high ROUGE to aspect section. Covers ~1M unique aspects and 32K domains.
OpenAsp (Amar et al., 2023): 1,310 aspect summaries over 419 multi-document sets, aspects derived and highlighted from generic summary by expert annotation.
LexAbSumm (Santosh et al., 2024): >1K triplets (legal judgment, aspect, summary), aspects from fact sheets/themes, summary sections manually extracted and filtered for legal domain.
ACLSum (Takeshita et al., 2024): Scientific papers (250), each split into challenge/approach/outcome aspects, with sentence highlights and expert-written single-sentence aspect summaries.
TracSum (Chu et al., 19 Aug 2025): 500 medical abstracts labeled for seven medical aspects (+sentence-level citations for traceability), yielding 3,500 aspect-summary-citation triples.
SumIPCC (Ghinassi et al., 2024): 140 aspect-focused summaries from climate-change reports, annotated by experts with explicit aspect-topic tags.
Domain-specific datasets: Tourist reviews (Mukherjee et al., 2020), consumer health answers (Chaturvedi et al., 2024), product reviews (Tian et al., 8 Oct 2025, Zhou et al., 11 Jun 2025), meetings (Deng et al., 2023), etc.

Annotation strategies vary: automatic (section/title mapping, distant supervision, clustering), expert extraction from summaries (OpenAsp), multi-stage human labeling for relevance and aspect type (TracSum, Consumer Health), and argument-scheme driven extraction via LLMs (AseSum).

3. Modeling Paradigms: Architectures and Aspect Conditioning

The principle challenge in ABS is enforcing aspect focus and fine-grained alignment. Approaches fall into several categories:

Prompted encoder-decoder models: Sequence-to-sequence architectures (BART, T5, Longformer-Encoder-Decoder/LED) condition input by prepending the aspect label or aspect token, processing $[\text{aspect}]\,[\text{SEP}]\,[D]$ (Yang et al., 2022, Santosh et al., 2024, Amar et al., 2023).
Long-context models: LED, PRIMERA, LongT5 (Santosh et al., 2024) handle context lengths up to 16K tokens, permitting comprehensive aspect extraction over very long inputs.
Chunked/fusion-in-decoder architectures: SLED-BART, Unlimiformer-BART split documents into overlapping chunks, each chunk conditioned on the aspect token, then fuse encoded representations (Santosh et al., 2024).
Self-supervised joint discovery: JADS (Guo et al., 2024) trains a Longformer encoder-decoder on shuffled sentence pools, with the decoder forced to break output into $K$ [SEP]-delimited aspect summaries. No explicit aspect labels are seen; latent topic clusters emerge from the alignment of [SEP]-segments to shuffled document summaries.
Multi-objective learning for dynamic aspect inference: MODABS (Guo et al., 2024) augments backbone models with (a) aspect-count prediction, (b) cross-entropy for per-aspect summary alignment, and (c) KL-divergence-based inter-aspect diversity loss; aspect channels are decoded in parallel and aspect number inferred via an auxiliary classifier head.
Retrieval Enhanced Approaches: Embedding-driven chunk/sentence selection filters and prunes input before generation, as in SARESG (Feng et al., 17 Apr 2025), which computes $\mathrm{sim}(f(s), f(a))$ for all sentences $s$ with respect to embedding $f(a)$ , extracting the top $K$ contextually relevant units subject to token budget constraints.
Argumentation Scheme Extraction: AseSum (Zhou et al., 11 Jun 2025) uses LLMs to extract argument triples $(a, s, x)$ (aspect, sentiment, evidence) from reviews, clusters and re-ranks by salience and validity, and assembles evidence-centric aspect summaries.
Multi-label classification + abstraction: AMTSum (Deng et al., 2023) leverages BERT-based sentence classification to assign aspect-relevance pseudo-labels, then feeds the filtered set to a summarizer.
Convolutional and multi-task CNNs: Early work (Wu et al., 2015) mapped sentences to aspect labels via cascaded and shared-embedding CNNs; sentiment was predicted only for aspect-positive sentences.

4. Training Objectives and Supervision

The standard generative objective is token-level cross-entropy over gold summaries: $L(\theta) = -\sum_{(D,a,S)\in\mathcal{D}} \sum_{t=1}^{|S|} \log p_\theta(s_t|s_{<t}, D, a)$ Extractor/selector components utilize binary cross-entropy over aspect labels. MODABS (Guo et al., 2024) combines alignment, diversity (KL divergence), and count loss: $\mathcal{L} = \alpha\,\mathcal{L}_{\mathrm{align}} + \beta\,\mathcal{L}_{\mathrm{div}} + \gamma\,\mathcal{L}_{\mathrm{count}}$ JADS uses a dataset where each training input concatenates shuffled sentences from $K$ source documents, with output as $K$ [SEP]-separated gold summaries; no aspect labels are available at train or test.

Weak and self-supervised recipes (e.g., ConceptNet aspect expansion (Tan et al., 2020)) generate pseudo golds for arbitrary aspect summarization.

5. Evaluation Frameworks and Metrics

Metrics utilized for ABS:

Automatic: ROUGE-1/2/L, BLEU, METEOR, BERTScore, macro-F1 for aspect-classification (text and extractive coverage), longest common subsequence/ROUGE-L_sum, aspect-count difference ( $|\hat K - K|$ ), and cluster purity/homogeneity for aspect induction (Guo et al., 2024, Guo et al., 2024).
Human: Quality dimensions (coherence, consistency, fluency, relevance, aspect quality), paired ranking, and semantic evaluations via LLM critics (GPT-4) for aspect relevance, coverage, and impurity (Mullick et al., 2024).
Task-specific: In medical ABS, traceability is assessed via claim and citation recall/precision (CLR/CIR/CLP/CIP) decomposed into subclaims and NLI entailment checks (Chu et al., 19 Aug 2025).
Book/long-document QA: Coverage scored as the average score across reference QA pairs $Q_j$ automatically extracted from a narrative knowledge graph (Miyazato et al., 9 Nov 2025).

6. Applications and Domain Adaptations

ABS has proven valuable in diverse domains:

Meetings: AMTSum (Deng et al., 2023) addresses interleaved, scattered aspect mentions in multi-party transcripts, extracting "Problems," "Actions," "Decisions" with substantially higher ROUGE than both generic and LLM-prompted approaches.
Legal: LexAbSumm (Santosh et al., 2024) enables efficient extraction of "facts," "holdings," etc., overcoming input-length and aspect focus bottlenecks; chunking and fusion improve aspect sensitivity.
Scientific: ACLSum (Takeshita et al., 2024) supports research analytics by exposing "challenge," "approach," "outcome," with best ROUGE from end-to-end T5_large.
Health QA: Aspect-classification via linguistic cues (personal pronouns, grammatical mood) and fine-tuned RoBERTa pipelines effectively separate suggestions, experiences, information (Chaturvedi et al., 2024).
Climate: SumIPCC (Ghinassi et al., 2024) integrates summarization and retrieval with carbon-aware scoring, showing SLMs can rival LLMs in aspect-focused highlighting at greatly reduced emissions.
Dynamic/disordered text: DABS (Guo et al., 2024) and MODABS (Guo et al., 2024) show robust unsupervised aspect discovery in settings with shuffled text and ambiguous aspect boundaries.
Opinion reviews: Classic work (Wu et al., 2015) established CNNs for aspect mapping and sentiment assignment on multi-aspect smartphone reviews; recent frameworks (AseSum (Zhou et al., 11 Jun 2025)) auto-induce aspects via LLM and argumentation prompts.

7. Challenges, Limitations, and Frontiers

Key challenges include:

Aspect relevance and differentiation: Many models lose focus, generating generic summaries even with aspect conditioning; chunk-position, input-overlap, and aspect token proximity are critical (Santosh et al., 2024).
Overgeneration/undergeneration of aspects: LLMs often produce too many/few aspect summaries; count prediction and dynamic aspect selection (MODABS, JADS) mitigate this (Guo et al., 2024, Guo et al., 2024).
Traceability and factuality: Medical/legal/high-stakes domains require provenance of claims to source sentences—traceable pipelines such as TracSum (Chu et al., 19 Aug 2025) and argument schemes in AseSum (Zhou et al., 11 Jun 2025) address this.
Scalability: Long documents and multi-document corpora necessitate chunking/fusion or hierarchical summarization (LexAbSumm, BookAsSumQA (Miyazato et al., 9 Nov 2025)); recursive summarization strategies are critical for token budget constraints.
Evaluation without golds: QA-based, reference-free metrics (BookAsSumQA), aspect-coverage via extractive mapping, and human/LLM semantic scoring are areas of ongoing research.
Multilingual/generalization: Most systems assume English and closed domains; future research will extend to multilingual and low-resource settings, adaptive aspect schemes, and cross-domain adaptation.

Frontiers comprise joint end-to-end aspect discovery and summarization, energy-efficient SLMs, hierarchical aspect modeling, retrieval augmentation, and semi-supervised/universal schemes for open-world aspect induction.