Generalized Incomplete Contingency Tables (GICTs)

Updated 14 November 2025

GICTs are multidimensional categorical data arrays that include entire rows of zeros due to random missingness or design constraints.
The framework uses Poisson/multinomial sampling and log-linear parametrizations to model complex interactions within observed data.
Sharp nonparametric bounding techniques enable robust causal inference even when standard methods fail due to missing cells.

Generalized Incomplete Contingency Tables (GICTs) are multidimensional arrays of categorical data that allow for entire rows of zero counts within the empirical support—either due to random missingness, sampling zeros, or structural design limitations. These tables arise naturally in settings where certain combinations of variables are completely unobserved for random or unpredictable reasons, as opposed to pre-specified structural exclusions. The present entry summarizes the defining properties, mathematical frameworks, inference strategies, and practical causal analysis methodologies associated with GICTs, integrating recent nonparametric bounding procedures and model-theoretic perspectives.

1. Definition and Conceptual Structure

Let $X_1,\ldots,X_n$ be categorical variables with supports $\mathcal{I}_1,\ldots,\mathcal{I}_n$ and $Y$ an outcome with support $\mathcal{J}$ . A classical contingency table consists of cell counts

$t_{i_1,\ldots,i_n,j} = \#\{\text{records with } X_1 = i_1, \ldots, X_n = i_n, Y = j\}$

for $(i_1,\ldots,i_n) \in \mathcal{I}_1 \times \cdots \times \mathcal{I}_n$ and $j \in \mathcal{J}$ , with total sample size $N = \sum_{i_1,\ldots,i_n}\sum_{j \in \mathcal{J}} t_{i_1,\ldots,i_n,j}$ . A table is a GICT if there exists at least one empirical-support row $(\bar{i}_1, \ldots, \bar{i}_n)$ such that

$t_{\bar{i}_1, \ldots, \bar{i}_n, j} = 0 \quad \forall j \in \mathcal{J}.$

These are sampling zeros entirely within the observed domain, not design-induced missingness.

GICTs generalize classical incomplete tables by treating such random zeros as primary and by explicitly modeling the absence of information in affected rows. The distinction is crucial: in GICTs, the presence of entire rows of zeros prevents definition or direct estimation of certain conditional probabilities or marginal effects, necessitating alternative inferential strategies.

2. Model-Theoretic Foundations of GICTs

GICT analysis formalizes incomplete tables by adopting either Poisson or multinomial sampling frameworks. Consider a cell index set $\mathcal{I} \subset \mathcal{Y}_1 \times \cdots \times \mathcal{Y}_K$ , potentially omitting inaccessible (structurally forbidden) cells but including all empirically observable combinations, including those with random zeros. The random vector $\{Y(i): i \in \mathcal{I}\}$ is modeled as:

Poisson: $Y(i) \sim \mathrm{Poi}(\lambda(i))$ , independent,
Multinomial: $(Y(i): i \in \mathcal{I}) \sim \mathrm{Mult}(N, p)$ .

Multiplicative (“log-linear-type”) models for GICTs are constructed using a model matrix $A$ whose rows correspond to cell subsets, yielding parameterizations

$\log \delta = A^{\prime} \beta,$

with either $\delta(i) = \lambda(i)$ or $\delta(i) = p(i)$ . The “relational” model framework captures both traditional log-linear and generalized odds-ratio models; the structure extends cleanly to GICTs with random zeros provided $A$ covers only the empirical domain (Klimova et al., 2011).

MLE existence and uniqueness are governed by the sample realization of the sufficient statistics $T(Y) = AY$ falling in the interior of the convex hull of $A$ 's support. When the overall effect (all-ones row) is absent due to missing rows, the model becomes a curved exponential family, and a mixed mean–canonical parameterization involving both subset sums and non-homogeneous odds ratios is employed.

3. Hierarchical Log-Linear Parametrization and Missing Data Mechanisms

For GICTs arising from data subject to random or systematic missingness, the hierarchical log-linear parametrization extends to incorporate missingness indicators $R_m \in \{1,2\}$ for each variable $Y_m$ , with $R_m=1$ (observed) or $2$ (missing). The resulting model for the augmented table is

$\log \mu_{i,r} = \lambda + \sum_{m=1}^p \lambda_{Y_m}(i_m) + \sum_{m=1}^p \lambda_{R_m}(r_m) + \sum_{m<n} \lambda_{Y_m Y_n}(i_m, i_n) + \sum_{m<n} \lambda_{R_m R_n}(r_m, r_n) + \sum_{m,n} \lambda_{Y_m R_n}(i_m, r_n),$

subject to zero-sum constraints. This formulation encapsulates the full joint distribution over observed and missing-data patterns (Ghosh et al., 2016).

Missing-data mechanisms for each variable can be characterized as:

MCAR (missing completely at random): all $\lambda_{Y_n R_m}(\cdot, \cdot) \equiv 0$ .
NMAR (not missing at random): only $\lambda_{Y_m R_m} \not\equiv 0$ , all other $\lambda_{Y_n R_m} \equiv 0$ .
MAR (missing at random): some $\lambda_{Y_n R_m} \not\equiv 0$ for $n \neq m$ , but $\lambda_{Y_m R_m} \equiv 0$ .

Direct, closed-form sensitivity analyses of mechanism (MAR vs. MCAR/NMAR) are carried out by comparing response and non-response odds intervals derived purely from fully observed and partially observed margins.

4. Inference and Sharp Nonparametric Bounding of Interventional Queries

When entire rows in a GICT are random zeros, causal or probabilistic queries involving those combinations become non-identifiable in the classical sense. The framework developed in (Lodato et al., 7 Nov 2025) introduces a sharp nonparametric bounding approach:

Unknown cell probabilities in missing rows are parameterized by free vectors $\pi_k^j$ $π_{k}^{j}$ subject to
- Non-negativity: $\pi_k^j \ge 0$ ,
- Normalization: $\sum_j \pi_k^j = 1$ for each missing-row index $k$ .

Given a symbolic expression $Q(\{\pi_k\})$ for the query of interest (such as $P(Y \mid \mathrm{do}(X=x))$ or ATE), under these constraints, the lower and upper sharp bounds are

$Q_{\min} = \min_{\{\pi_k\}} Q(\pi), \qquad Q_{\max} = \max_{\{\pi_k\}} Q(\pi),$

where optimization is performed over the feasible set determined by the probability simplex for each missing row.

In practical scenarios where missing rows are known to have small total frequency compared to $N$ , the expressions $Q(\pi)$ often reduce to a linear (or ratio-of-linear) function of $\pi$ , and standard linear programming (or fractional programming after Charnes–Cooper transformation) produces the bounds efficiently. These bounds are mechanism-independent, requiring only support and basic probability axioms, and provide formal quantification of inferential uncertainty in the presence of GICTs.

5. Application to Causal Inference: Worked Example

To illustrate, consider a binary setting:

$H$ (treatment), $A$ (covariate), $O$ (outcome), each $\in \{0,1\}$ .
Causal graph: $A \to H \to O$ , $A \to O$ .
The observed GICT for $(A,H) \to O$ features two missing rows: $(A=0,H=0)$ and $(A=1,H=1)$ , both all-zero.

Empirical counts:

$n_1=20$ , $n_2=10$ for $(A=0,H=1,O=0)$ and $(A=0,H=1,O=1)$ ,
$n_3=30$ , $n_4=15$ for $(A=1,H=0,O=0)$ and $(A=1,H=0,O=1)$ ,
$N=75$ .

The target, $Q(\pi_1^1, \pi_2^1) = \mathrm{ATE} \approx 0.6\,\pi_2^1 - 0.4\,\pi_1^1 - 0.0167$ , is minimized and maximized over $\pi_1^1, \pi_2^1 \in [0,1]$ , producing $\mathrm{ATE} \in [-0.4167,\,0.5833]$ . Even with two corners of the table unobserved, this bounds the average treatment effect under minimal, nonparametric assumptions.

6. Implications, Assumptions, and Limitations

GICT bounds are conservative, often wide, but always contain the true value under the assumptions:
- All random zeros must be internal to the empirical support;
- The small-missing-frequency approximation ( $\sum_j x_k^j \ll N$ ) yields sharper and algebraically simpler bounds;
- No assumptions are made about the missing-data mechanism (MCAR, MAR, NMAR are all accommodated without modeling).
The approach does not impute missing data or discard affected entries, but preserves all uncertainty in the explicit optimization.
If large portions of the table are missing, or missingness is not negligible relative to $N$ , bounds may be less informative.

A plausible implication is that the GICT framework enables disciplined, mechanism-agnostic causal inference in high-dimensional settings with moderate 'random' unobservability, in contrast to traditional methods that require either imputation or strong missing-data assumptions.

7. Connections to Other Methodologies and Extensions

GICTs generalize both traditional incomplete tables and structural-zero models; the relational modeling perspective (Klimova et al., 2011) extends to arbitrary sets of allowed cells and encompasses curved exponential families, odds-ratio models, and canonical parameterizations for structural zeros. Log-linear parametrizations with auxiliary missingness indicators provide a systematic means for both modeling and sensitivity testing of possible missingness mechanisms (Ghosh et al., 2016).

Sensitivity analysis procedures can be applied non-iteratively, using only observed cell and margin counts, for empirical assessment of the plausibility of MAR, MCAR, or NMAR regimes.

This suggests a unifying role for GICTs as a framework for rigorous, mechanism-robust statistical modeling and inference in categorical data analysis—even beyond the original context of contingency tables, extending to applications in epidemiology, social science, and high-dimensional causal inference where empirical supports are often only partially observed.

Markdown Report Issue Upgrade to Chat

References (3)

Relational models for contingency tables (2011)

Evaluation of missing data mechanisms in two and three dimensional incomplete tables (2016)

Bounding interventional queries from generalized incomplete contingency tables (2025)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Generalized Incomplete Contingency Tables (GICTs).

Generalized Incomplete Contingency Tables (GICTs)

1. Definition and Conceptual Structure

2. Model-Theoretic Foundations of GICTs

3. Hierarchical Log-Linear Parametrization and Missing Data Mechanisms

4. Inference and Sharp Nonparametric Bounding of Interventional Queries

5. Application to Causal Inference: Worked Example

6. Implications, Assumptions, and Limitations

7. Connections to Other Methodologies and Extensions

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Don't miss out on important new AI/ML research

Generalized Incomplete Contingency Tables (GICTs)

1. Definition and Conceptual Structure

2. Model-Theoretic Foundations of GICTs

3. Hierarchical Log-Linear Parametrization and Missing Data Mechanisms

4. Inference and Sharp Nonparametric Bounding of Interventional Queries

5. Application to Causal Inference: Worked Example

6. Implications, Assumptions, and Limitations

7. Connections to Other Methodologies and Extensions

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Related Topics

Don't miss out on important new AI/ML research

Sign up for free to explore the frontiers of research