Papers
Topics
Authors
Recent
Search
2000 character limit reached

Terminology Concordant Queries (TCQ)

Updated 14 January 2026
  • TCQ is a rigorously defined information retrieval query that includes exact domain-specific technical terms to ensure answerability from specified passages.
  • The TCQ generation pipeline leverages document layout detection, passage chunking, and iterative LLM-guided densification to inject precise terminology.
  • TCQs enable controlled benchmarking of IR models by contrasting lexical matching capabilities with terminology agnostic queries, highlighting practical retrieval strengths and limitations.

A Terminology Concordant Query (TCQ) is a rigorously constructed information retrieval (IR) query that, by definition, contains one or more domain-specific technical terms verbatim and is answerable from a specified source passage. TCQs were introduced in the context of the STELLA framework for aerospace IR benchmarking, which provides controlled, paired query sets to disentangle and analyze lexical and semantic matching in embedding and retrieval models (Kim, 7 Jan 2026). Each TCQ is designed to stress-test a system’s ability to perform exact lexical matching against domain terminology, in contrast to Terminology Agnostic Queries (TAQs), which omit surface-form technical terms and instead employ descriptive paraphrases.

1. Formal Definition and Construction

Given a passage set P={p1,...,pN}P = \{p_1, ..., p_N\} (e.g., from NASA Technical Reports) and a domain-specific terminology dictionary T={t1,...,tM}T = \{t_1, ..., t_M\} constructed from PP, a TCQ is defined as follows:

QTCQ={qQpP,tTpT,iIntent:q=TCQGen(p,Tp,i)tq}Q_{TCQ} = \{ q \in Q \mid \exists p \in P, \exists t \in T_p \subseteq T, \exists i \in \mathrm{Intent} : q = \mathrm{TCQGen}(p, T_p, i) \land t \in q \}

Here, Tp=Ttokens(p)T_p = T \cap \mathrm{tokens}(p) denotes the set of terms from the dictionary present in passage pp, Intent\mathrm{Intent} is a finite set of information-seeking objectives (including definitions, numeric queries, procedural/operational, component, and anomaly categories), and TCQGen\mathrm{TCQGen} is a query generation function producing exactly one TCQ per (p,i)(p, i) pair. Every TCQ includes at least one member of TpT_p verbatim (preserving original spelling/case/hyphenation) and is guaranteed to be answerable from pp. In contrast, a TAQ constructed for the same passage and intent excludes all tTpt \in T_p from its surface form (Kim, 7 Jan 2026).

2. Generation Pipeline in the STELLA Framework

The process for constructing TCQs (as implemented in STELLA) consists of the following stages:

  1. Document Layout Detection: NASA Technical Report PDFs are processed using DocLayout-YOLO, detecting and ordering text regions sequentially while filtering low-confidence non-text blocks (confidence < 0.25).
  2. Passage Chunking: The Recursive-Token-Chunker splits text into overlapping 100-token chunks, yielding a set of passages PP (approximately 2.4 million passages).
  3. Terminology Dictionary Construction:
    • Regex-based extraction of acronyms (e.g., CFD), hyphenated compounds (e.g., Navier-Stokes), technical notation (e.g., 3-sigma, H2_2O).
    • Filtering involves: document-frequency 10\geq 10, part-of-speech restrictions to (proper) nouns, and specificity threshold τ3.5\tau \leq 3.5 via wordfreq, selecting rare or technical language.
  4. Candidate Passage Selection: Passages with Tp5|T_p| \geq 5 are intent-classified via prompt-based GPT-5 and clustered via EmbeddingGemma-300m and kk-medoids (k=5k=5); 100 passages per intent are selected (total 500).
  5. Dual-type Query Generation (TCQ focus):
    • For passage pp, intent ii, query generation proceeds in three iterative LLM-guided steps:
      • Seed: Generate an intent-compliant question omitting all terms in TpT_p.
      • First Densification: One term taTpt_a \in T_p is injected verbatim via Chain-of-Density (CoD) and validated through self-reflection for answerability, format, length, and intent compliance.
      • Second Densification: Another term tbtat_b \ne t_a is injected.
    • At each step, recognized and added entities are tracked, and the process halts if format/length/intent criteria are not met. The final query q3q_3 is returned as the TCQ.

Pseudocode:

1
2
3
4
5
6
seed  LLM.query_seed(p, i, ban_terms=T_p)
t_a, t_b  sample_two_distinct_terms(T_p)
q  seed
q  LLM.coDense(q, add_term=t_a, self_reflect=True)
q  LLM.coDense(q, add_term=t_b, self_reflect=True)
return q
When producing non-English TCQs, the framework preserves technical terminology in English while translating only the surrounding descriptive text.

3. Evaluation Methods and Lexical-Matching Metrics

TCQs are used to measure the lexical-matching capability of IR models, for which classical probabilistic IR metrics such as BM25 are utilized. Given a query qq and passage pp:

BM25(q,p)=wqIDF(w)f(w,p)(k1+1)f(w,p)+k1(1b+bp/avgdl)\mathrm{BM25}(q,p) = \sum_{w \in q} \mathrm{IDF}(w) \cdot \frac{f(w,p) \cdot (k_1+1)}{f(w,p) + k_1 \cdot (1-b + b \cdot |p|/\mathrm{avgdl})}

where f(w,p)f(w,p) is the term frequency, p|p| is passage length, avgdl\mathrm{avgdl} is the average passage length, k11.2k_1 \approx 1.2, b0.75b \approx 0.75, and IDF(w)\mathrm{IDF}(w) is the inverse document frequency. Retrieval is evaluated via normalized discounted cumulative gain at top-kk (nDCG@k):

DCG@k=i=1k2reli1log2(i+1),nDCG@k=DCG@kIDCG@k\mathrm{DCG}@k = \sum_{i=1}^{k} \frac{2^{\mathrm{rel}_i} - 1}{\log_2(i+1)}, \quad \mathrm{nDCG}@k = \frac{\mathrm{DCG}@k}{\mathrm{IDCG}@k}

The lexical-dependency gap Δi(M)\Delta_i(M) for a model MM and intent ii is defined as the nDCG@10 difference between TCQs and TAQs:

Δi(M)=nDCG@10M(TCQ)nDCG@10M(TAQ)\Delta_i(M) = \mathrm{nDCG}@10_M(\mathrm{TCQ}) - \mathrm{nDCG}@10_M(\mathrm{TAQ})

Large Δi(M)\Delta_i(M) values indicate models reliant on exact lexical overlap, while small values indicate stronger semantic matching (Kim, 7 Jan 2026).

4. Illustrative Example

Consider the following passage and terminology:

  • Passage: "The LOX/hydrocarbon propellant combination in this stage features a staged combustion cycle. Chamber pressure is maintained at 10 MPa, and injector design mitigates combustion instability."
  • Terminology: TpT_p = { "propellant", "staged combustion cycle", "combustion instability", ... }
  • Intent: Procedure/Operation.

Step-wise TCQ Generation:

  • Seed (no terms): "What pressure and design features control stable operation in this engine stage?"
  • Densify (+“propellant”): "What pressure and propellant flow arrangements ensure stable operation in this engine stage?"
  • Densify again (+“staged combustion cycle”): "What propellant flow and staged combustion cycle parameters maintain stable operation at 10 MPa?"
  • Final TCQ: "What propellant flow and staged combustion cycle parameters maintain stable operation at 10 MPa?"

This process strictly enforces the inclusion of at least one technical term verbatim from the source and yields queries with high terminological fidelity suitable for lexical-matching metrics (Kim, 7 Jan 2026).

5. Comparative Role versus Terminology Agnostic Queries (TAQs)

Each (TCQ, passage) pair is complemented with a TAQ, generated by paraphrasing technical terms via context-derived explanations (e.g., substituting "propellant" with "chemical substance burned to generate thrust"). The use of these dual query types permits a principled, quantitative disentanglement of lexical and semantic retrieval capacity for IR models. BM25 ranking assesses pure lexical overlap, while dense-embedding retrieval (e.g., Llama-Embed-Nemotron) evaluates semantic matching. The paired query construction allows the controlled measurement of how much retrieval performance is attributable to surface-form term recognition versus semantic generalization.

6. Strengths and Limitations

Benefits:

  • TCQs afford precise measurement of lexical retrieval, since each query contains explicit technical terms, thereby isolating the ability of models to match on surface form.
  • TCQs, when paired with TAQs, enable controlled analysis of the lexical-semantic matching spectrum and quantify the retrieval dependence on direct terminology overlap.
  • The construction of TCQs reflects real-world domain practices: engineers and practitioners often use exact part names, acronyms, and domain-specific jargon in practical search scenarios, enhancing ecological validity.

Limitations:

  • TCQs are machine-generated and can exhibit a uniform or over-structured style compared to genuine user queries.
  • They emphasize lexical matching exclusively and thus do not test semantic retrieval for synonymy or innovative paraphrasing of terms.
  • TCQs are restricted to queries answerable from a single passage, excluding broader multi-hop reasoning or negative (unanswerable) query types.
  • Cross-lingual TCQs retain English technical terms, which may not always mirror actual user translation/reformulation practices in non-English contexts (Kim, 7 Jan 2026).

A plausible implication is that while TCQs provide a robust framework for benchmarking lexical retrieval, a holistic evaluation of IR systems also requires complementary query formats such as TAQs and realistic user logs.

7. Applications and Impact in Domain-Specific IR Benchmarking

TCQs form a core component of the STELLA aerospace IR benchmark, which enables reproducible and interpretable evaluation of lexical and semantic search models in technical document collections. Their design allows benchmarking classical methods (e.g., BM25) directly against deep embedding models, with evidence showing that lexical methods remain competitive in technical domains where exact term matching is essential. The approach pioneered by TCQs advances domain-specific IR benchmarking by ensuring terminological rigor and systematic measurement of retrieval models’ handling of surface-form terminology, which is crucial in high-stakes engineering and scientific information systems (Kim, 7 Jan 2026).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Terminology Concordant Query (TCQ).