Posterior-Separable Cost Functions

Updated 8 October 2025

Posterior-separable cost functions are defined by their reliance on the distribution of posterior beliefs, ensuring that cost is an additive function of these beliefs.
They satisfy key axioms such as mixture convexity, sub-additivity, and Blackwell monotonicity, which guarantee tractability and normative consistency in modeling information costs.
Extensions like Max–Rényi forms introduce non-linearity to reconcile empirical deviations from classical models, enhancing predictive power in rational inattention and information economics.

Posterior-separable cost functions are a class of information cost models where the total cost of an experiment or information structure depends solely on the distribution of posterior beliefs induced by the experiment, typically via an additive or expectation-based formula. The standard form arises in models of rational inattention and Bayesian experimentation, where the penalty for acquiring information is a function of the agent’s posterior beliefs after observing the signal, often taking the form $C(\pi) = \mathbb{E}_\pi[H(\nu)] - H(\mathbb{E}_\pi[\nu])$ for some convex potential $H$ . These cost functions are normatively justified by axioms such as mixture linearity and sub-additivity, and they exhibit strong tractability in optimization and identification tasks. Recent developments have extended this framework by axiomatizing more general convex information costs and by introducing non-linear or "maximum-over-measures" forms, allowing for richer empirical predictions and accommodating experimental evidence that cannot be reconciled with classic posterior-separable costs.

1. Foundational Principles and Axiomatic Structure

Posterior-separability characterizes cost functions over experiments where the cost is determined exclusively by the induced distribution over posterior beliefs (or means) and can be decomposed additively across the constituent posteriors. The foundational axioms in this domain are:

Mixture convexity: For any two experiments $\mu, \nu$ and any $\alpha \in [0,1]$ ,

$C(\alpha\mu + (1-\alpha)\nu) \leq \alpha C(\mu) + (1-\alpha)C(\nu)$

ensuring randomization between experiments reduces expected cost.

Sub-additivity: For independently bundled experiments,

$C(\mu \otimes \nu) \leq C(\mu) + C(\nu)$

with identity additivity enforcing $C(\mu \otimes \mu) = 2C(\mu)$ .

Blackwell monotonicity (contextually): More informative experiments incur (weakly) higher cost.

These axioms rule out pathological cost behaviors and ensure that balance or averaging in information acquisition is rewarded, while "extreme" or highly differentiated outputs are penalized more heavily.

2. Mathematical Representation: Max–Rényi and Special Cases

The general representation for convex information costs adherent to the above axioms is the "Max–Rényi" form:

$C(\mu) = \max_{m \in M} \int D_{(\alpha, \beta)}(\mu) \ dm(\alpha, \beta)$

where $M$ is a compact set of measures, and

$D_{(\alpha, \beta)}(\mu) = -\log \sum_{s} \prod_{i\in\Theta} \mu_i(s)^{\alpha_i}$

with $\alpha \in [0,1]^\Theta$ , $\sum_{i}\alpha_i=1$ . In the degenerate case where $\alpha$ is a unit vector, $D_{(\alpha, \beta)}(\mu)$ converges to weighted Kullback–Leibler divergence:

$\lim_{\gamma \to 1} D_{(\gamma e_i + (1-\gamma)\beta)}(\mu) = \sum_{j\neq i}\beta_j\ \mathrm{KL}(\mu_i \| \mu_j)$

Two special cases within this framework are prominent:

Cost Type	Mathematical Formulation	Key Property
Max–KL	$C(\mu)=\max_{\beta\in B} \sum_{i,j}\beta_{ij} \ \mathrm{KL}(\mu_i \\| \mu_j)$	Posterior-separable; mixture linear (under dilution linearity)
Rényi	$C(\mu)=\lambda \, D_{(\alpha, \beta)}(\mu)$	Convex, monotone transformation of a posterior-separable cost

The general Max–Rényi form admits mixture convexity and sub-additivity but allows non-linearity in mixtures (via the max operator), accounting for empirically observed deviations from the classic posterior-separable paradigm.

3. Connections to Classical Cost Functions and Divergence Measures

Traditional posterior-separable costs, such as mutual information (Shannon entropy) and quadratic variance penalties, are particular cases where mixture linearity is not only satisfied but essential; that is, these cost functions depend solely on the expectation over posterior beliefs and compose linearly under mixtures.

The extension to Rényi divergences introduces convex combinations and non-linear maxima, bridging the gap between Kullback–Leibler (KL) divergence-based costs and more general convex metrics for distinguishability between signal distributions. The representation unifies standard cases and newly tractable models under one umbrella, indicating that:

KL-based costs are recovered as a special case under additional dilution linearity axioms,
Rényi costs are monotone, convex transforms of posterior-separable costs subject to independence-type axioms.

4. Deviations from Mixture Linearity and Empirical Evidence

Recent experimental studies (e.g., Dean et al.) reveal behavior incompatible with strict mixture linearity: observed decision makers often use more distinct actions than the cardinality of the state space, while the posterior-separable theory predicts a bound of $|\Theta|$ distinct actions. The Max–Rényi construction, by allowing non-linear mixing and the maximization over a set of divergence measures, generates optimal policies with richer action sets.

This suggests a pragmatic extension of theory: agents' information acquisition strategies cannot always be captured by expected posterior costs, and mixture-violating convex cost functions are needed to reconcile such empirical findings.

5. Implications for Rational Inattention and Information Economics

The axiomatic and representation results have substantial impact in the analysis of rational inattention, Bayesian persuasion, and broader information design problems. Specifically:

Posterior-separable costs yield tractable characterizations of optimal strategies but may restrict the richness of equilibrium actions or signals.
Max–Rényi-type costs enable modeling of scenarios where agents optimize over more nuanced and balanced signal structures, consistent with laboratory and field data.
The framework allows explicit bounds and comparative statics in market selection, welfare analysis, and demand estimation, especially in environments where attention costs are empirically heterogeneous or context-dependent.

A plausible implication is that modelers seeking empirical fidelity in rational inattention contexts should employ axiomatic approaches (mixture convexity, sub-additivity, Blackwell monotonicity) and consider Max–Rényi or convex-transformed posterior-separable cost specifications.

6. Future Directions and Applications

Several avenues extend directly from this foundation:

Systematic testing of mixture convexity and sub-additivity in experimental designs to pin down which cost representations best explain observed information acquisition.
Development of computational algorithms tailored to Max–Rényi costs for large-scale inference or optimal signaling structure computation.
Application of these cost structures in dynamic settings (e.g., sequential experimentation, adaptive signal design) where balancing and sub-additivity play crucial roles.
Identification and estimation theory in empirical models that exploit the axiomatic characterization to recover agents' cost functions from observed choice and signal data.

7. Summary Table: Axioms and Cost Representations

Axiom/Special Property	Cost Representation	Mixture Linearity	Sub-additivity	Empirical Fit (Dean et al.)
Posterior-separability	$C(\pi)=\mathbb{E}_\pi[H(\nu)]-H(\mathbb{E}_\pi[\nu])$	Yes	Yes	Often violated
Max–KL	$C(\mu)=\max_{\beta\in B}\sum_{i,j}\beta_{ij}\mathrm{KL}(\mu_i\\|\mu_j)$	Yes (if dilution linearity)	Yes	Accommodates extra actions
Rényi	$C(\mu)=\lambda D_{(\alpha,\beta)}(\mu)$	Monotone convex	Yes	Flexible

The distinction in empirical fit reflects that classical posterior-separable costs are often too restrictive, whereas Max–Rényi and convex-transformed costs accommodate broader phenomena observed in decision-making experiments.

In sum, posterior-separable cost functions anchor one end of the spectrum in convex information cost modeling, but recent axiomatic and empirical findings necessitate a move toward more flexible, divergence-based representations such as Max–Rényi costs. These models retain the benefits of convexity (for optimization and identification) while capturing departures from mixture linearity and enabling richer behavioral predictions in information economics.

PDF Markdown Chat (Pro)

Follow Topic

Get notified by email when new papers are published related to Posterior-Separable Cost Functions.