Probably Approximately Symmetric Framework

Updated 15 April 2026

The PAS framework integrates plug‐in and pseudo-PML methods to achieve near-optimal estimation of symmetric properties under varying sample count regimes.
It defines symmetry as invariance under label permutations and employs sample splitting to allocate ‘easy’ and ‘hard’ symbols for improved estimation accuracy.
The approach extends to geometric symmetry detection by approximating shape distortions through sublinear sampling while ensuring theoretical performance guarantees.

The Probably Approximately Symmetric (PAS) framework denotes two distinct but foundational approaches in the literature. One line (Charikar et al., 2020) develops a general-purpose framework for optimal symmetric property estimation over distributions—the focus of information-theoretic property estimation. The other (Korman et al., 2014) introduces PAS for efficient symmetry detection in geometric shapes, with strong theoretical guarantees. Both share the core idea of blending probabilistic and approximate methods to handle symmetry, but their technical domains and mechanisms are fundamentally different.

1. Symmetric Property Estimation: Principle and Formalism

Given a finite alphabet $D$ of size $N$ with discrete distribution $p\in\Delta_N$ (the $N$ -simplex), symmetric properties are scalar functions $f:\Delta_N\to\mathbb{R}$ that are invariant under relabeling of $D$ . A broad and important subclass is the separable symmetric properties, $f(p) = \sum_{x\in D}g(p_x)$ for scalar $g$ . Empirically, for $n$ i.i.d. samples $X^n$ , the counts $N$ 0 define the profile $N$ 1—the histogram of symbol count frequencies.

A central fact is that any symmetric property estimator depends only on $N$ 2, not the labeling. Classic properties such as support size ( $N$ 3), Shannon entropy ( $N$ 4), and $N$ 5-distance to uniformity fit this formalism.

2. The Plug-in Estimator and the “Easy” Regime

When the per-symbol sample count is large, empirical estimation—plug-in estimator $N$ 6 where $N$ 7—is minimax-optimal for many $N$ 8. Specifically, for smooth $N$ 9 in “large” $p\in\Delta_N$ 0 regions ( $p\in\Delta_N$ 1), and $p\in\Delta_N$ 2 exceeding the effective support scale (tuned to the property), the bias and variance are controlled:

For Shannon entropy, if $p\in\Delta_N$ 3, the sample complexity is $p\in\Delta_N$ 4.
For $p\in\Delta_N$ 5-distance to uniformity, in the regime $p\in\Delta_N$ 6, $p\in\Delta_N$ 7 suffices.

In these “easy” regions, the PAS framework reverts to the empirical estimator, unifying both the trivial and complex regimes.

3. The Difficult Region: Profile Maximum Likelihood and Pseudo-PML

When $p\in\Delta_N$ 8 becomes small ( $p\in\Delta_N$ 9 or $N$ 0 nonsmooth near $N$ 1), empirical methods suffer from significant bias. For such hard regimes, the Profile Maximum Likelihood (PML) estimator is introduced:

The PML distribution $N$ 2 solves $N$ 3, maximizing the probability of observing the sample profile under $N$ 4.
Acharya–Das–Orlitsky–Suresh (ADOS’16) established that substituting $N$ 5 into $N$ 6 yields universal minimax-optimal estimators for bounded $N$ 7.

However, exact PML is computationally intractable ( $N$ 8-hard and NP-hard). The PAS framework replaces exact PML with computationally feasible approximate variants—pseudo-PML—by restricting attention to subsets $N$ 9 of “difficult” symbols (typically those with small counts). The S-pseudo-profile $f:\Delta_N\to\mathbb{R}$ 0 and its corresponding pseudo-PML $f:\Delta_N\to\mathbb{R}$ 1 are optimized only over $f:\Delta_N\to\mathbb{R}$ 2, exploiting lower complexity for tractability (e.g., via convex relaxations or Sinkhorn scaling).

4. The PAS Framework: Two-Stage Estimation and Sample Complexity

PAS proceeds in two stages:

Sample Splitting: Split $f:\Delta_N\to\mathbb{R}$ 3 samples into $f:\Delta_N\to\mathbb{R}$ 4 and $f:\Delta_N\to\mathbb{R}$ 5.
Subset Selection: Use $f:\Delta_N\to\mathbb{R}$ 6 to define the hard subset $f:\Delta_N\to\mathbb{R}$ 7 (symbols with frequency in a target set $f:\Delta_N\to\mathbb{R}$ 8), and $f:\Delta_N\to\mathbb{R}$ 9 as the good subset.
Pseudo-PML Estimation on $D$ 0: On $D$ 1, estimate the S-pseudo-profile, and compute a $D$ 2-approximate pseudo-PML $D$ 3.
Combined Estimation: For $D$ 4, use plug-in with bias correction. Return

$D$ 5

Main sample complexity result: For a property with “complexity” $D$ 6 (e.g., $D$ 7 for entropy), PAS attains $D$ 8 with high probability as soon as

$D$ 9

This rate matches known instance-optimal bounds throughout both easy and hard regimes.

Comparison with previous PML-based methods ([ADOS’16]): PAS eliminates the need for property-specific polynomial approximations and broadens near-optimality.

5. Algorithmic Structure and Implementation

The PAS algorithm can be summarized as follows:

Input: $f(p) = \sum_{x\in D}g(p_x)$ 0 samples, property $f(p) = \sum_{x\in D}g(p_x)$ 1, threshold set $f(p) = \sum_{x\in D}g(p_x)$ 2.
Step 1: Split samples into $f(p) = \sum_{x\in D}g(p_x)$ 3, $f(p) = \sum_{x\in D}g(p_x)$ 4.
Step 2: Define $f(p) = \sum_{x\in D}g(p_x)$ 5 from $f(p) = \sum_{x\in D}g(p_x)$ 6 via $f(p) = \sum_{x\in D}g(p_x)$ 7; let $f(p) = \sum_{x\in D}g(p_x)$ 8.
Step 3: Extract S-pseudo-profile from $f(p) = \sum_{x\in D}g(p_x)$ 9.
Step 4: Compute $g$ 0-approximate S-pseudo-PML using convex-concave surrogates, Sinkhorn, or local methods for $g$ 1 with $g$ 2.
Step 5: For $g$ 3, use plug-in with correction; for $g$ 4, use $g$ 5.
Step 6: Return combined estimator as above.

The pseudo-PML optimization dominates runtime but is practical for $g$ 6 or $g$ 7.

6. Applications and Worked Examples

PAS achieves near-optimal sample complexity in the estimation of core symmetric properties:

Property	Complexity $g$ 8	Sample Complexity	Empirical Suffices When
Shannon entropy $g$ 9	$n$ 0	$n$ 1	$n$ 2
$n$ 3 to uniformity $n$ 4	$n$ 5	$n$ 6	$n$ 7
Support size under $n$ 8	$n$ 9	$X^n$ 0	all $X^n$ 1 (PML-optimal)

For entropy, PAS transitions from plug-in in the easy regime to pseudo-PML on the rare-symbol tail, capturing missing-mass behavior. For support size estimation, PML plug-in is minimax-optimal for all $X^n$ 2.

7. Summary and Further Directions

The PAS framework (Charikar et al., 2020) provides a unified, instance-optimal estimation strategy for a broad class of separable symmetric properties, integrating plug-in estimators and tractable PML-based correction. The key algorithmic insight is constraining expensive optimization to small “hard” subsets, ensuring computational feasibility while matching information-theoretic lower bounds. Open questions include developing polynomial-time $X^n$ 3 approximations for general PML, extending to non-separable properties (e.g., Rényi entropy), and generalizing PAS to more complex statistical settings (e.g., multi-sample estimation, testing).

Secondary usage—PAS in geometric symmetry detection (Korman et al., 2014)—follows similar probabilistic-approximate principles. Here, a rigid transformation $X^n$ 4 is an $X^n$ 5-symmetry of a shape $X^n$ 6 if its distortion in $X^n$ 7 norm (integrating the level-set difference over the ball $X^n$ 8) is at most $X^n$ 9. The algorithm samples $N$ 00 at density tied to the total variation of the shape, using sublinear random sampling to estimate distortion, and achieves $N$ 01-probability correctness within user-specified accuracy $N$ 02 and complexity $N$ 03.

Both paradigms demonstrate the power of combining probabilistic correctness with approximate or subsampled optimization to achieve theoretical tightness and computational practicality—unifying the theory and practice of symmetric estimation and detection across statistics and geometry (Charikar et al., 2020, Korman et al., 2014).

Markdown Report Issue Upgrade to Chat

References (2)

A General Framework for Symmetric Property Estimation (2020)

Probably Approximately Symmetric: Fast rigid Symmetry Detection with Global Guarantees (2014)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Probably Approximately Symmetric (PAS) Framework.

Probably Approximately Symmetric Framework

1. Symmetric Property Estimation: Principle and Formalism

2. The Plug-in Estimator and the “Easy” Regime

3. The Difficult Region: Profile Maximum Likelihood and Pseudo-PML

4. The PAS Framework: Two-Stage Estimation and Sample Complexity

5. Algorithmic Structure and Implementation

6. Applications and Worked Examples

7. Summary and Further Directions

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Don't miss out on important new AI/ML research

Probably Approximately Symmetric Framework

1. Symmetric Property Estimation: Principle and Formalism

2. The Plug-in Estimator and the “Easy” Regime

3. The Difficult Region: Profile Maximum Likelihood and Pseudo-PML

4. The PAS Framework: Two-Stage Estimation and Sample Complexity

5. Algorithmic Structure and Implementation

6. Applications and Worked Examples

7. Summary and Further Directions

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Related Topics

Don't miss out on important new AI/ML research

Sign up for free to explore the frontiers of research