Papers
Topics
Authors
Recent
Search
2000 character limit reached

Generalized Incomplete Contingency Tables (GICTs)

Updated 14 November 2025
  • GICTs are multidimensional categorical data arrays that include entire rows of zeros due to random missingness or design constraints.
  • The framework uses Poisson/multinomial sampling and log-linear parametrizations to model complex interactions within observed data.
  • Sharp nonparametric bounding techniques enable robust causal inference even when standard methods fail due to missing cells.

Generalized Incomplete Contingency Tables (GICTs) are multidimensional arrays of categorical data that allow for entire rows of zero counts within the empirical support—either due to random missingness, sampling zeros, or structural design limitations. These tables arise naturally in settings where certain combinations of variables are completely unobserved for random or unpredictable reasons, as opposed to pre-specified structural exclusions. The present entry summarizes the defining properties, mathematical frameworks, inference strategies, and practical causal analysis methodologies associated with GICTs, integrating recent nonparametric bounding procedures and model-theoretic perspectives.

1. Definition and Conceptual Structure

Let X1,,XnX_1,\ldots,X_n be categorical variables with supports I1,,In\mathcal{I}_1,\ldots,\mathcal{I}_n and YY an outcome with support J\mathcal{J}. A classical contingency table consists of cell counts

ti1,,in,j=#{records with X1=i1,,Xn=in,Y=j}t_{i_1,\ldots,i_n,j} = \#\{\text{records with } X_1 = i_1, \ldots, X_n = i_n, Y = j\}

for (i1,,in)I1××In(i_1,\ldots,i_n) \in \mathcal{I}_1 \times \cdots \times \mathcal{I}_n and jJj \in \mathcal{J}, with total sample size N=i1,,injJti1,,in,jN = \sum_{i_1,\ldots,i_n}\sum_{j \in \mathcal{J}} t_{i_1,\ldots,i_n,j}. A table is a GICT if there exists at least one empirical-support row (iˉ1,,iˉn)(\bar{i}_1, \ldots, \bar{i}_n) such that

tiˉ1,,iˉn,j=0jJ.t_{\bar{i}_1, \ldots, \bar{i}_n, j} = 0 \quad \forall j \in \mathcal{J}.

These are sampling zeros entirely within the observed domain, not design-induced missingness.

GICTs generalize classical incomplete tables by treating such random zeros as primary and by explicitly modeling the absence of information in affected rows. The distinction is crucial: in GICTs, the presence of entire rows of zeros prevents definition or direct estimation of certain conditional probabilities or marginal effects, necessitating alternative inferential strategies.

2. Model-Theoretic Foundations of GICTs

GICT analysis formalizes incomplete tables by adopting either Poisson or multinomial sampling frameworks. Consider a cell index set IY1××YK\mathcal{I} \subset \mathcal{Y}_1 \times \cdots \times \mathcal{Y}_K, potentially omitting inaccessible (structurally forbidden) cells but including all empirically observable combinations, including those with random zeros. The random vector {Y(i):iI}\{Y(i): i \in \mathcal{I}\} is modeled as:

  • Poisson: Y(i)Poi(λ(i))Y(i) \sim \mathrm{Poi}(\lambda(i)), independent,
  • Multinomial: (Y(i):iI)Mult(N,p)(Y(i): i \in \mathcal{I}) \sim \mathrm{Mult}(N, p).

Multiplicative (“log-linear-type”) models for GICTs are constructed using a model matrix AA whose rows correspond to cell subsets, yielding parameterizations

logδ=Aβ,\log \delta = A^{\prime} \beta,

with either δ(i)=λ(i)\delta(i) = \lambda(i) or δ(i)=p(i)\delta(i) = p(i). The “relational” model framework captures both traditional log-linear and generalized odds-ratio models; the structure extends cleanly to GICTs with random zeros provided AA covers only the empirical domain (Klimova et al., 2011).

MLE existence and uniqueness are governed by the sample realization of the sufficient statistics T(Y)=AYT(Y) = AY falling in the interior of the convex hull of AA's support. When the overall effect (all-ones row) is absent due to missing rows, the model becomes a curved exponential family, and a mixed mean–canonical parameterization involving both subset sums and non-homogeneous odds ratios is employed.

3. Hierarchical Log-Linear Parametrization and Missing Data Mechanisms

For GICTs arising from data subject to random or systematic missingness, the hierarchical log-linear parametrization extends to incorporate missingness indicators Rm{1,2}R_m \in \{1,2\} for each variable YmY_m, with Rm=1R_m=1 (observed) or $2$ (missing). The resulting model for the augmented table is

logμi,r=λ+m=1pλYm(im)+m=1pλRm(rm)+m<nλYmYn(im,in)+m<nλRmRn(rm,rn)+m,nλYmRn(im,rn),\log \mu_{i,r} = \lambda + \sum_{m=1}^p \lambda_{Y_m}(i_m) + \sum_{m=1}^p \lambda_{R_m}(r_m) + \sum_{m<n} \lambda_{Y_m Y_n}(i_m, i_n) + \sum_{m<n} \lambda_{R_m R_n}(r_m, r_n) + \sum_{m,n} \lambda_{Y_m R_n}(i_m, r_n),

subject to zero-sum constraints. This formulation encapsulates the full joint distribution over observed and missing-data patterns (Ghosh et al., 2016).

Missing-data mechanisms for each variable can be characterized as:

  • MCAR (missing completely at random): all λYnRm(,)0\lambda_{Y_n R_m}(\cdot, \cdot) \equiv 0.
  • NMAR (not missing at random): only λYmRm≢0\lambda_{Y_m R_m} \not\equiv 0, all other λYnRm0\lambda_{Y_n R_m} \equiv 0.
  • MAR (missing at random): some λYnRm≢0\lambda_{Y_n R_m} \not\equiv 0 for nmn \neq m, but λYmRm0\lambda_{Y_m R_m} \equiv 0.

Direct, closed-form sensitivity analyses of mechanism (MAR vs. MCAR/NMAR) are carried out by comparing response and non-response odds intervals derived purely from fully observed and partially observed margins.

4. Inference and Sharp Nonparametric Bounding of Interventional Queries

When entire rows in a GICT are random zeros, causal or probabilistic queries involving those combinations become non-identifiable in the classical sense. The framework developed in (Lodato et al., 7 Nov 2025) introduces a sharp nonparametric bounding approach:

  • Unknown cell probabilities in missing rows are parameterized by free vectors πkj\pi_k^j subject to
    • Non-negativity: πkj0\pi_k^j \ge 0,
    • Normalization: jπkj=1\sum_j \pi_k^j = 1 for each missing-row index kk.

Given a symbolic expression Q({πk})Q(\{\pi_k\}) for the query of interest (such as P(Ydo(X=x))P(Y \mid \mathrm{do}(X=x)) or ATE), under these constraints, the lower and upper sharp bounds are

Qmin=min{πk}Q(π),Qmax=max{πk}Q(π),Q_{\min} = \min_{\{\pi_k\}} Q(\pi), \qquad Q_{\max} = \max_{\{\pi_k\}} Q(\pi),

where optimization is performed over the feasible set determined by the probability simplex for each missing row.

In practical scenarios where missing rows are known to have small total frequency compared to NN, the expressions Q(π)Q(\pi) often reduce to a linear (or ratio-of-linear) function of π\pi, and standard linear programming (or fractional programming after Charnes–Cooper transformation) produces the bounds efficiently. These bounds are mechanism-independent, requiring only support and basic probability axioms, and provide formal quantification of inferential uncertainty in the presence of GICTs.

5. Application to Causal Inference: Worked Example

To illustrate, consider a binary setting:

  • HH (treatment), AA (covariate), OO (outcome), each {0,1}\in \{0,1\}.
  • Causal graph: AHOA \to H \to O, AOA \to O.
  • The observed GICT for (A,H)O(A,H) \to O features two missing rows: (A=0,H=0)(A=0,H=0) and (A=1,H=1)(A=1,H=1), both all-zero.

Empirical counts:

  • n1=20n_1=20, n2=10n_2=10 for (A=0,H=1,O=0)(A=0,H=1,O=0) and (A=0,H=1,O=1)(A=0,H=1,O=1),
  • n3=30n_3=30, n4=15n_4=15 for (A=1,H=0,O=0)(A=1,H=0,O=0) and (A=1,H=0,O=1)(A=1,H=0,O=1),
  • N=75N=75.

The target, Q(π11,π21)=ATE0.6π210.4π110.0167Q(\pi_1^1, \pi_2^1) = \mathrm{ATE} \approx 0.6\,\pi_2^1 - 0.4\,\pi_1^1 - 0.0167, is minimized and maximized over π11,π21[0,1]\pi_1^1, \pi_2^1 \in [0,1], producing ATE[0.4167,0.5833]\mathrm{ATE} \in [-0.4167,\,0.5833]. Even with two corners of the table unobserved, this bounds the average treatment effect under minimal, nonparametric assumptions.

6. Implications, Assumptions, and Limitations

  • GICT bounds are conservative, often wide, but always contain the true value under the assumptions:
    • All random zeros must be internal to the empirical support;
    • The small-missing-frequency approximation (jxkjN\sum_j x_k^j \ll N) yields sharper and algebraically simpler bounds;
    • No assumptions are made about the missing-data mechanism (MCAR, MAR, NMAR are all accommodated without modeling).
  • The approach does not impute missing data or discard affected entries, but preserves all uncertainty in the explicit optimization.
  • If large portions of the table are missing, or missingness is not negligible relative to NN, bounds may be less informative.

A plausible implication is that the GICT framework enables disciplined, mechanism-agnostic causal inference in high-dimensional settings with moderate 'random' unobservability, in contrast to traditional methods that require either imputation or strong missing-data assumptions.

7. Connections to Other Methodologies and Extensions

GICTs generalize both traditional incomplete tables and structural-zero models; the relational modeling perspective (Klimova et al., 2011) extends to arbitrary sets of allowed cells and encompasses curved exponential families, odds-ratio models, and canonical parameterizations for structural zeros. Log-linear parametrizations with auxiliary missingness indicators provide a systematic means for both modeling and sensitivity testing of possible missingness mechanisms (Ghosh et al., 2016).

Sensitivity analysis procedures can be applied non-iteratively, using only observed cell and margin counts, for empirical assessment of the plausibility of MAR, MCAR, or NMAR regimes.

This suggests a unifying role for GICTs as a framework for rigorous, mechanism-robust statistical modeling and inference in categorical data analysis—even beyond the original context of contingency tables, extending to applications in epidemiology, social science, and high-dimensional causal inference where empirical supports are often only partially observed.

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Generalized Incomplete Contingency Tables (GICTs).