Concepts of Interest (COIs)
- Concepts of Interest (COIs) are defined as technical constructs representing key entities, topics, or patterns, grounded in precise mathematical formulations.
- They are extracted using methods like text mining, representation learning, statistical inference, and evolutionary search to achieve interpretable outcomes.
- Applications span biomedical texts, social media targeting, recommender systems, and learning analytics, enhancing data-driven decision making.
A Concept of Interest (COI) is a technical construct used across scientific disciplines to identify, represent, and extract entities, topics, or patterns that carry heightened relevance to a task, user, or domain. The formal instantiation of COIs varies: in knowledge graphs they may correspond to entity-relation pairs or compositional embeddings; in text analysis, to semantic categories or biomedical term pairs; in learning analytics, to course concepts spatiotemporally anchored to instructional content; in deep learning, to directions in latent space that modulate predictive decisions; and in multi-objective optimization, to concepts defined by their capacity to yield solutions within pre-defined performance criteria. Common to all approaches is the drive to algorithmically define, filter, and rank COIs to focus downstream computation and deliver precise, interpretable results. This article surveys key methodologies for concept of interest identification, mathematical formulation, empirical evaluation, and operational integration, as established in recent research.
1. Formal Definitions and Mathematical Representation
COI definitions are highly context-dependent, yet share a foundational emphasis on explicit mathematical structure:
- Categorical Entities (Social Media): In tweet analysis, a COI is a topical category (e.g., "Food", "Politics") matched by intersection scores between user-generated keywords and pre-defined tag sets (Hossen et al., 2018). The EICV measure quantifies
with categorization gated by relative thresholds.
- Knowledge Graph Embeddings (Recommender Systems): InBox treats a concept as an axis-aligned box in space, parameterized by center and offset vectors:
COIs emerge as intersections of multiple box-embeddings, supporting compositionality and fine-grained querying (Xu et al., 19 Mar 2024).
- Latent Directions (Deep Representations): Concept discovery in neural networks models each COI as a unit vector ; activation scores are inner products , and relevance for class is statistically inferred via variance of directional derivatives (Janik et al., 2022).
- Temporal Abstractions (RL): In option discovery, a concept of interest is specified by a differentiable interest function that modulates the initiation probability of option at state (Khetarpal et al., 2020).
- Design Optimization: The WOI-based framework defines COIs as concepts for which there exists at least one such that , where is a polyhedral region in objective space (Farhi et al., 2017).
2. COI Extraction Methodologies
Each domain supports distinctive pipelines for COI identification, typically optimized for scale, noise-robustness, and interpretability.
- Text Mining: A two-stage filtering method leverages readability metrics (Fog Index) to select candidate sentences containing biomedical concept pairs, followed by association matrix construction and statistical refinement via harmonic mean of PPV and sensitivity. Top- pairs by HM are retained as final COIs, achieving ~75–80% semantic accuracy against UMLS validation (Shams et al., 2013).
- Representation Learning: Concept probing employs informativeness (, normalized mutual information) and regularity (, logistic-probe accuracy) to select optimal layers for COI decoding. The composite score identifies layers where concepts are encoded in linearly decodable form (Ribeiro et al., 24 Jul 2025).
- Statistical Inference: High-dimensional concept discovery uses large-scale multiple hypothesis testing (Benjamini–Hochberg or empirical Bayes FDR) to control for false positives, based on activation statistics or directional derivative variance over candidate directions (Janik et al., 2022).
- Interest Functions (Temporal Abstraction): The interest-option-critic algorithm simultaneously learns intra-option policy, termination, and interest parameters via gradient-based updates, biasing option selection toward regions where is high (Khetarpal et al., 2020).
- Evolutionary Search: WOI-MOEA employs multi-population evolutionary algorithms, ranking solution candidates and allocating generation quotas according to WOI-distance; concepts with solutions in the WOI are labeled as COIs. Dynamic resource allocation accelerates the discovery of satisficing concepts (Farhi et al., 2017).
3. Evaluation Metrics and Ranking Criteria
Domain-appropriate metrics are crucial for COI filtering and ranking. Representative examples include:
| Method | Key Metric/Formulation | Thresholding/Selection Mechanism |
|---|---|---|
| EICV Tweet Profiling (Hossen et al., 2018) | Intersection size / normalized EICV | Relative to , |
| Fog Index Text Mining (Shams et al., 2013) | Harmonic mean of PPV, sensitivity | Top 10 HM-scoring concept pairs |
| Knowledge Graph Embedding (Xu et al., 19 Mar 2024) | Point-box or box-box distance | Points inside intersection boxes |
| Concept Probing (Ribeiro et al., 24 Jul 2025) | Layer score | |
| Statistical Screening (Janik et al., 2022) | Test statistic (std, FDR) | BH/FDR cutoff |
| WOI-MOEA Optimization (Farhi et al., 2017) | Distance to WOI | COI |
Quantitative validation against semantic networks (e.g., UMLS), held-out accuracy, recall@20, and search efficiency are commonly reported, with box-based recommenders achieving 13–22% recall improvements over baselines (Xu et al., 19 Mar 2024) and WOI-based MOEAs halving evaluation budgets compared to sequential search (Farhi et al., 2017).
4. Ontological and Structural Models
COI ontologies formalize the mapping from data objects to abstract constructs:
- Multimodal Frames (Art Images): Social concepts are instantiated as SCMultiModalFrame classes linked to multisensory slots (colors, objects, actions) aggregated from corpus metadata and image analysis. The MUSCO ontology integrates DOLCE, DnS patterns, and SKOS hierarchies to manage over 166 social concepts across 70,000 artworks, enabling structured knowledge graph population for future multimodal inference (Pandiani et al., 2021).
- Spatiotemporal Anchoring (Learning Analytics): In COIVis, each COI is defined by a triplet , tying hierarchical course concepts to salient video intervals and precise screen regions via multimodal parsing. Learner engagement is quantified at the COI level through eye-tracking features (attention, cognitive load, interest, preference, synchronicity), supporting both cohort-level aggregation and individual drill-downs (Zhou et al., 7 Dec 2025).
5. Case Studies and Practical Insights
Applied research demonstrates the efficacy and limits of these frameworks:
- Biomedical Texts: Two-stage FI-based filtering extracts connected concepts in 24-paper corpora with 75–80% semantic validation against external ontologies (Shams et al., 2013).
- Social Media Targeting: EICV achieves 93.03% accuracy in tweet-topic assignment for up to 25 COIs, with robustness dependent on tag coverage and threshold calibration. Limitations include scalability of manual tag curation and language specificity (Hossen et al., 2018).
- Recommender Systems: InBox box embeddings show tight clustering and coverage in concept-based item retrieval, with empirical gains over graph-based and point-based baselines (Xu et al., 19 Mar 2024).
- Representation Discovery: Screening and visualization workflows balance multiple-testing rigor with human-in-the-loop refinement, managing high-dimensional concept spaces and mitigating automated overgeneration (Janik et al., 2022).
- Optimization: WOI-MOEA accelerates the identification of satisficing concepts in design libraries, directly aligning search heuristics with practitioner-defined aspiration levels (Farhi et al., 2017).
6. Guidelines, Limitations, and Prospects
Practitioner recommendations and identified limitations emphasize:
- Curate comprehensive tag sets or concept definitions to maximize COI extraction accuracy.
- Use balanced samples, robust statistical corrections (e.g., BH, FDR), and interpretable feature mappings for screening in deep representations.
- Exploit multimodal and spatiotemporal anchoring for granular engagement analytics and ontology construction.
- Anticipate scalability issues in manual tag-generation or search space coverage; consider algorithmic expansion via external APIs or unsupervised clustering.
- Prefer resource-efficient evolutionary or box-based search where practical constraints dictate computational budget.
Further refinement, such as dynamic thresholding, automatic ambiguity detection, weighted intersection scoring, and multilingual expansion, represents ongoing research directions outlined in foundational studies (Hossen et al., 2018, Xu et al., 19 Mar 2024, Janik et al., 2022, Zhou et al., 7 Dec 2025).
References:
- "Extracting Connected Concepts from Biomedical Texts using Fog Index" (Shams et al., 2013)
- "Discovering Users Topic of Interest from Tweet" (Hossen et al., 2018)
- "Options of Interest: Temporal Abstraction with Interest Functions" (Khetarpal et al., 2020)
- "Discovering Concepts in Learned Representations using Statistical Inference and Interactive Visualization" (Janik et al., 2022)
- "Automatic Modeling of Social Concepts Evoked by Art Images as Multimodal Frames" (Pandiani et al., 2021)
- "InBox: Recommendation with Knowledge Graph using Interest Box Embedding" (Xu et al., 19 Mar 2024)
- "Concept Probing: Where to Find Human-Defined Concepts (Extended Version)" (Ribeiro et al., 24 Jul 2025)
- "COIVis: Eye tracking-based Visual Exploration of Concept Learning in MOOC Videos" (Zhou et al., 7 Dec 2025)
- "Window-of-interest based Multi-objective Evolutionary Search for Satisficing Concepts" (Farhi et al., 2017)