Papers
Topics
Authors
Recent
Search
2000 character limit reached

Probably Approximately Symmetric Framework

Updated 15 April 2026
  • The PAS framework integrates plug‐in and pseudo-PML methods to achieve near-optimal estimation of symmetric properties under varying sample count regimes.
  • It defines symmetry as invariance under label permutations and employs sample splitting to allocate ‘easy’ and ‘hard’ symbols for improved estimation accuracy.
  • The approach extends to geometric symmetry detection by approximating shape distortions through sublinear sampling while ensuring theoretical performance guarantees.

The Probably Approximately Symmetric (PAS) framework denotes two distinct but foundational approaches in the literature. One line (Charikar et al., 2020) develops a general-purpose framework for optimal symmetric property estimation over distributions—the focus of information-theoretic property estimation. The other (Korman et al., 2014) introduces PAS for efficient symmetry detection in geometric shapes, with strong theoretical guarantees. Both share the core idea of blending probabilistic and approximate methods to handle symmetry, but their technical domains and mechanisms are fundamentally different.

1. Symmetric Property Estimation: Principle and Formalism

Given a finite alphabet DD of size NN with discrete distribution pΔNp\in\Delta_N (the NN-simplex), symmetric properties are scalar functions f:ΔNRf:\Delta_N\to\mathbb{R} that are invariant under relabeling of DD. A broad and important subclass is the separable symmetric properties, f(p)=xDg(px)f(p) = \sum_{x\in D}g(p_x) for scalar gg. Empirically, for nn i.i.d. samples XnX^n, the counts NN0 define the profile NN1—the histogram of symbol count frequencies.

A central fact is that any symmetric property estimator depends only on NN2, not the labeling. Classic properties such as support size (NN3), Shannon entropy (NN4), and NN5-distance to uniformity fit this formalism.

2. The Plug-in Estimator and the “Easy” Regime

When the per-symbol sample count is large, empirical estimation—plug-in estimator NN6 where NN7—is minimax-optimal for many NN8. Specifically, for smooth NN9 in “large” pΔNp\in\Delta_N0 regions (pΔNp\in\Delta_N1), and pΔNp\in\Delta_N2 exceeding the effective support scale (tuned to the property), the bias and variance are controlled:

  • For Shannon entropy, if pΔNp\in\Delta_N3, the sample complexity is pΔNp\in\Delta_N4.
  • For pΔNp\in\Delta_N5-distance to uniformity, in the regime pΔNp\in\Delta_N6, pΔNp\in\Delta_N7 suffices.

In these “easy” regions, the PAS framework reverts to the empirical estimator, unifying both the trivial and complex regimes.

3. The Difficult Region: Profile Maximum Likelihood and Pseudo-PML

When pΔNp\in\Delta_N8 becomes small (pΔNp\in\Delta_N9 or NN0 nonsmooth near NN1), empirical methods suffer from significant bias. For such hard regimes, the Profile Maximum Likelihood (PML) estimator is introduced:

  • The PML distribution NN2 solves NN3, maximizing the probability of observing the sample profile under NN4.
  • Acharya–Das–Orlitsky–Suresh (ADOS’16) established that substituting NN5 into NN6 yields universal minimax-optimal estimators for bounded NN7.

However, exact PML is computationally intractable (NN8-hard and NP-hard). The PAS framework replaces exact PML with computationally feasible approximate variants—pseudo-PML—by restricting attention to subsets NN9 of “difficult” symbols (typically those with small counts). The S-pseudo-profile f:ΔNRf:\Delta_N\to\mathbb{R}0 and its corresponding pseudo-PML f:ΔNRf:\Delta_N\to\mathbb{R}1 are optimized only over f:ΔNRf:\Delta_N\to\mathbb{R}2, exploiting lower complexity for tractability (e.g., via convex relaxations or Sinkhorn scaling).

4. The PAS Framework: Two-Stage Estimation and Sample Complexity

PAS proceeds in two stages:

  1. Sample Splitting: Split f:ΔNRf:\Delta_N\to\mathbb{R}3 samples into f:ΔNRf:\Delta_N\to\mathbb{R}4 and f:ΔNRf:\Delta_N\to\mathbb{R}5.
  2. Subset Selection: Use f:ΔNRf:\Delta_N\to\mathbb{R}6 to define the hard subset f:ΔNRf:\Delta_N\to\mathbb{R}7 (symbols with frequency in a target set f:ΔNRf:\Delta_N\to\mathbb{R}8), and f:ΔNRf:\Delta_N\to\mathbb{R}9 as the good subset.
  3. Pseudo-PML Estimation on DD0: On DD1, estimate the S-pseudo-profile, and compute a DD2-approximate pseudo-PML DD3.
  4. Combined Estimation: For DD4, use plug-in with bias correction. Return

DD5

Main sample complexity result: For a property with “complexity” DD6 (e.g., DD7 for entropy), PAS attains DD8 with high probability as soon as

DD9

This rate matches known instance-optimal bounds throughout both easy and hard regimes.

Comparison with previous PML-based methods ([ADOS’16]): PAS eliminates the need for property-specific polynomial approximations and broadens near-optimality.

5. Algorithmic Structure and Implementation

The PAS algorithm can be summarized as follows:

  • Input: f(p)=xDg(px)f(p) = \sum_{x\in D}g(p_x)0 samples, property f(p)=xDg(px)f(p) = \sum_{x\in D}g(p_x)1, threshold set f(p)=xDg(px)f(p) = \sum_{x\in D}g(p_x)2.
  • Step 1: Split samples into f(p)=xDg(px)f(p) = \sum_{x\in D}g(p_x)3, f(p)=xDg(px)f(p) = \sum_{x\in D}g(p_x)4.
  • Step 2: Define f(p)=xDg(px)f(p) = \sum_{x\in D}g(p_x)5 from f(p)=xDg(px)f(p) = \sum_{x\in D}g(p_x)6 via f(p)=xDg(px)f(p) = \sum_{x\in D}g(p_x)7; let f(p)=xDg(px)f(p) = \sum_{x\in D}g(p_x)8.
  • Step 3: Extract S-pseudo-profile from f(p)=xDg(px)f(p) = \sum_{x\in D}g(p_x)9.
  • Step 4: Compute gg0-approximate S-pseudo-PML using convex-concave surrogates, Sinkhorn, or local methods for gg1 with gg2.
  • Step 5: For gg3, use plug-in with correction; for gg4, use gg5.
  • Step 6: Return combined estimator as above.

The pseudo-PML optimization dominates runtime but is practical for gg6 or gg7.

6. Applications and Worked Examples

PAS achieves near-optimal sample complexity in the estimation of core symmetric properties:

Property Complexity gg8 Sample Complexity Empirical Suffices When
Shannon entropy gg9 nn0 nn1 nn2
nn3 to uniformity nn4 nn5 nn6 nn7
Support size under nn8 nn9 XnX^n0 all XnX^n1 (PML-optimal)

For entropy, PAS transitions from plug-in in the easy regime to pseudo-PML on the rare-symbol tail, capturing missing-mass behavior. For support size estimation, PML plug-in is minimax-optimal for all XnX^n2.

7. Summary and Further Directions

The PAS framework (Charikar et al., 2020) provides a unified, instance-optimal estimation strategy for a broad class of separable symmetric properties, integrating plug-in estimators and tractable PML-based correction. The key algorithmic insight is constraining expensive optimization to small “hard” subsets, ensuring computational feasibility while matching information-theoretic lower bounds. Open questions include developing polynomial-time XnX^n3 approximations for general PML, extending to non-separable properties (e.g., Rényi entropy), and generalizing PAS to more complex statistical settings (e.g., multi-sample estimation, testing).

Secondary usage—PAS in geometric symmetry detection (Korman et al., 2014)—follows similar probabilistic-approximate principles. Here, a rigid transformation XnX^n4 is an XnX^n5-symmetry of a shape XnX^n6 if its distortion in XnX^n7 norm (integrating the level-set difference over the ball XnX^n8) is at most XnX^n9. The algorithm samples NN00 at density tied to the total variation of the shape, using sublinear random sampling to estimate distortion, and achieves NN01-probability correctness within user-specified accuracy NN02 and complexity NN03.

Both paradigms demonstrate the power of combining probabilistic correctness with approximate or subsampled optimization to achieve theoretical tightness and computational practicality—unifying the theory and practice of symmetric estimation and detection across statistics and geometry (Charikar et al., 2020, Korman et al., 2014).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (2)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Probably Approximately Symmetric (PAS) Framework.