Semantic Probability Calculation

Updated 26 January 2026

Semantic probability calculation is a framework that quantifies fuzzy and graded truth values by unifying logical (semantic) probability with statistical likelihoods.
It employs a semantic Bayes’ formula and the Semantic Information Measure to translate between event frequencies and degrees of truth, optimizing information content and confirmation.
Applications include falsification of hypotheses, resolving paradoxes, enhancing neural model interpretability, and improving semantic communications in AI.

Semantic probability calculation refers to frameworks and methodologies for assigning, manipulating, and optimizing probabilistic degrees of belief to semantic objects such as hypotheses, propositions, or statements beyond merely extending classical statistical probability. Its objective is to precisely quantify the information content, support, or confirmation afforded by data to potentially fuzzy, graded, or logically structured assertions, thereby unifying tools from logic, Shannon information theory, hypothesis testing, and Bayesian rationality. The modern theory distinguishes between logical (semantic) probability—encoding fuzziness or degree of truth—with truth functions—and statistical (sampling) probability—encoding event frequencies or likelihoods—necessitating distinct but interlinked inference mechanisms. This article surveys the main developments, foundational formulas, and central applications in semantic probability calculation with emphasis on the Semantic Information Measure (SIM) and degree of confirmation (@@@@1@@@@) paradigms.

1. Fundamentals: Logical Probability vs. Statistical Probability

Semantic probability frameworks strictly distinguish two probabilistic domains:

Statistical Probability (SP; $P$ ): Encodes event frequencies, likelihoods, or sampling distributions, always normalized so that for a sample space $\{e_1, ..., e_m\}$ , $\sum_i P(e_i) = 1$ . In inference, $P(e_i \mid h_j)$ is the likelihood of observing evidence $e_i$ given hypothesis $h_j$ .
Logical Probability (LP; $T$ ): Represents fuzzy or graded truth values—a membership function—assigned to propositions or predicates. For predicate $h_j$ and evidence $e_i$ , $T(h_j \mid e_i) \in [0,1]$ measures the degree to which $e_i$ satisfies $h_j$ . The prior logical probability of $h_j$ , denoted $T(A_j)$ , is the expected truth under the prior distribution:

$T(A_j) = \sum_{i} P(e_i) \; T(A_j \mid e_i)$

LP is not normalized over mutually overlapping predicates, typically leading to $\sum_j T(A_j) > 1$ .

A key formula linking SP and LP, called the semantic Bayes’ formula, is:

$P(E \mid A_j) = \frac{P(E) \; T(A_j \mid E)}{T(A_j)}$

This enables translation between likelihoods and fuzzy truth assessments (Lu, 2016).

2. Semantic Information Measure (SIM)

The SIM generalizes classical Shannon information and Fisher’s likelihood ratio to account for the semantic content of fuzzy hypotheses and graded truths.

Single-case semantic information is given by:

$I(e_i; h_j) = \log \frac{P(e_i \mid A_j)}{P(e_i)} = \log \frac{T(A_j \mid e_i)}{T(A_j)}$

This ratio tests the sharpness of $h_j$ 's prediction on $e_i$ against the prior likelihood of $e_i$ . The less a priori likely $h_j$ , and the more accurately it predicts the evidence, the more semantic information it imparts.

Average semantic information (averaged over possible evidences under the sampling distribution):

$I(E; h_j) = \sum_i P(e_i \mid h_j) \log \frac{T(A_j \mid e_i)}{T(A_j)}$

This expression coincides with a log-likelihood ratio averaged under the “semantic” sampling law $P(e_i \mid h_j)$ . The measure can be decomposed into Kullback–Leibler divergences, unifying it with standard information-theoretic and log-likelihood-based criteria (Lu, 2016).

3. Degree of Confirmation and Optimization

Quantifying confirmation or degree of belief in (fuzzy) universal hypotheses necessitates introducing a “degree of belief” parameter $b$ into the truth function:

$T(h_j^b \mid E) = b' + b T(A_j \mid E), \qquad b' = 1 - |b|$

For universal hypotheses, evidenced partitioned into positive/counter-examples, the optimal $b$ (the degree of confirmation, DOC) is determined by maximizing the average semantic information with respect to $b$ . The closed-form solution is:

$b^* = 1 - \frac{Q_0/Q_1}{P_0/P_1}$

where $P_0, P_1$ are prior probabilities of counterexamples and positive examples and $Q_0, Q_1$ are corresponding probabilities under the likelihood (Lu, 2016). The DOC increases as the observed counterexamples decrease relative to their prior rate.

Negative DOCs (interpreted as degrees of disbelief or refutation) naturally emerge when observed counterexamples exceed expectations. In such scenarios, the degree of belief is allowed to become negative, and the truth function is adapted accordingly.

4. Semantic Probability in Practice: Falsification, Paradox Resolution, and Applications

The SIM framework serves as a falsification criterion in Popper’s sense: a hypothesis with high specificity and high observed truth receives high semantic information and confirmation, while immediate counterexamples (with $T(A_j \mid e_i) = 0$ ) force the information to $-\infty$ , indicating strict falsification.

For Hempel’s raven paradox, the framework breaks the logical symmetry between a universal hypothesis and its contrapositive in the presence of asymmetric evidence types (e.g., white chalk vs. black raven), since the confirmation increment is explicitly tied to the positive-to-counterexample ratios, not the abstract logical equivalence.

Medical test effectiveness is formalized via the DOC:

$b^* = 1 - \frac{1 - \text{specificity}}{\text{sensitivity}}$

This delivers a bounded, interpretable degree of confirmation, functionally paralleling but improving on the traditional likelihood ratio (Lu, 2016).

5. Semantic Probability Calculation in Broader Contexts

Semantic probability calculations also appear in probabilistically-typed grammars, as in calculating the total probability that a symbolic expression is generated by a probabilistic grammar (PCFG), subject to semantic equivalence (e.g., as an equivalence class of algebraic expressions). While the general problem is undecidable, efficient algorithms exist for restricted grammar families (linear, polynomial, rational) using dynamic programming, inclusion–exclusion, and approximations (Primožič et al., 2022).

Frameworks for inferring semantic probabilities in knowledge-graphs assign confidences (typically via model-based or frequency-based estimates) to extracted semantic units, often aggregating these via entropy measures to optimize information transmission or support computation of marginal and conditional probabilities required for deductive and abductive inference (Zhao et al., 2023, Zhao et al., 2023).

Further, in applied settings such as place annotation, the joint spatial, temporal, and categorical probabilities are combined via the Bayesian criterion to assign a semantic probability to candidate annotations, refined by informative weightings such as TF–IDF (Cheng et al., 2022).

6. Theoretical and Logical Underpinnings

Semantic probability calculation frameworks are fundamentally different from classical probabilistic logic, where one assigns probabilities to logical sentences via distributions over possible worlds. While such approaches (cf. Nilsson-style) are sufficient for modeling degrees of belief in classical sentences, they cannot directly express statistical regularities (“most birds fly”), default inference, or confirmation increments; these require richer semantic-probability frameworks involving truth functions, conditional likelihoods, and explicit treatment of reference classes, as addressed in Kyburg’s interval-valued probabilistic semantics (Bacchus, 2013, Jr, 2013) and expressive-logical probability assignments consistent with monotonic reasoning, inductive support, and confirmation of universal generalizations (Hutter et al., 2012).

7. Relationship to Neural and Statistical Learning

Semantic probability frameworks, including the SIM and the P–T probability framework, have clear analogs in interpretable AI and neural modeling. Neural classifiers' softmax outputs can be interpreted as truth functions ( $T(\Theta_j \mid x)$ ), and the extended semantic Bayes’ theorem unifies their outputs with likelihood-based and information-theoretic perspectives (Lu, 2020). Models capturing semantic specificity and entailment—such as probabilistic sentence encodings—exploit the differential entropy and KL-divergence for probabilistic sentence representations, naturally extending the logic of semantic probability to deep learning models (Chen et al., 2020).

References

"Semantic Information Measure with Two Types of Probability for Falsification and Confirmation" (Lu, 2016)
"P(Expression|Grammar): Probability of deriving an algebraic expression with a probabilistic context-free grammar" (Primožič et al., 2022)
"Semantic Information Extraction for Text Data with Probability Graph" (Zhao et al., 2023)
"A Joint Communication and Computation Design for Semantic Wireless Communication with Probability Graph" (Zhao et al., 2023)
"An unsupervised approach for semantic place annotation of trajectories based on the prior probability" (Cheng et al., 2022)
"Learning Probabilistic Sentence Representations from Paraphrases" (Chen et al., 2020)
"The P-T Probability Framework for Semantic Communication, Falsification, Confirmation, and Bayesian Reasoning" (Lu, 2020)
"Probability Distributions Over Possible Worlds" (Bacchus, 2013)
"Semantics for Probabilistic Inference" (Jr, 2013)
"Probabilities on Sentences in an Expressive Logic" (Hutter et al., 2012)