Papers
Topics
Authors
Recent
Assistant
AI Research Assistant
Well-researched responses based on relevant abstracts and paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses.
Gemini 2.5 Flash
Gemini 2.5 Flash 175 tok/s
Gemini 2.5 Pro 52 tok/s Pro
GPT-5 Medium 36 tok/s Pro
GPT-5 High 38 tok/s Pro
GPT-4o 92 tok/s Pro
Kimi K2 218 tok/s Pro
GPT OSS 120B 442 tok/s Pro
Claude Sonnet 4.5 38 tok/s Pro
2000 character limit reached

Source Attribution Percentage Matrix

Updated 8 October 2025
  • The Source Attribution Percentage Matrix is a quantitative framework that partitions multivariate pollutant data into source-specific percentage contributions using geometric estimators.
  • It addresses classical NMF limitations by ensuring scale invariance and relaxing strict purity and sparsity assumptions through convex hull geometry.
  • Numerical simulations confirm its consistency and accuracy in source recovery, making it a practical tool for air pollution control and regulatory assessment.

The Source Attribution Percentage Matrix is a quantitative framework devised for attributing observed multivariate signals—such as concentrations of air pollutants—to their contributing sources, resolving two major classical limitations in non-negative matrix factorization (NMF): non-uniqueness and the need for restrictive assumptions. By defining the estimand at the population level and constructing consistent geometric estimators, the matrix enables robust, interpretable partitioning of observed concentrations into percentages attributable to each source, while ensuring invariance to rescaling and relaxing classical NMF requirements (Jin et al., 4 Oct 2025).

1. Formal Definition and Identifiability

Let YR+n×JY \in \mathbb{R}_+^{n \times J} be the observed data matrix, with nn samples (e.g., time points) and JJ features (e.g., pollutant concentrations). The standard NMF representation models Y=WHY = W H where WR+n×KW \in \mathbb{R}_+^{n \times K} encodes the sample-level contributions (emissions) from KK sources and HR+K×JH \in \mathbb{R}_+^{K \times J} encodes the per-unit source profiles across pollutants.

The central construct, the Source Attribution Percentage Matrix Φ\Phi, is defined elementwise as: ϕkj=μkHkjμHj\phi_{kj} = \frac{\mu_k H_{kj}}{ \sum_\ell \mu_\ell H_{\ell j} } where μk=E(Wik)\mu_k = E(W_{ik}) is the expected emission from source kk. Thus, each ϕkj\phi_{kj} represents the population-level fraction of the concentration of pollutant jj attributable to source kk.

Key identifiability results show that Φ\Phi is uniquely defined under two conditions:

  • The emission process {Wi}\{W_i\} is stationary and ergodic (allowing empirical estimates of μk\mu_k via sample averaging).
  • The emission distribution is probabilistically separable—i.e., puts positive probability on regions in the emission simplex close to each canonical direction, ensuring near-pure source patterns can be observed without requiring strict 'pure pixel' or sparsity constraints.

The matrix Φ\Phi remains invariant under arbitrary diagonal rescalings of WW and HH, unlike the factors themselves. This property is critical for the interpretation and comparison of source contributions across studies and measurement units.

2. Geometric Estimation Methodology

The estimation procedure exploits the conical geometry of the data induced by the NMF model:

  • Row-normalize each sample: Yi=Yi/riY^*_{i} = Y_i / r_i, where ri=jYijr_i = \sum_j Y_{ij} to represent each sample in the canonical simplex.
  • The normalized source profiles hk=hk/(jhkj)h_k^* = h_k / ( \sum_j h_{kj} ) form the vertices of the convex polytope governing the data distribution.
  • The convex hull of the normalized data approximates this polytope, and the model estimates the KK vertices by finding the KK points that maximize the (K1)(K-1)-dimensional volume within the convex hull.
  • Once HH^* is determined, the estimator for Φ\Phi is constructed using the sample means of WiW_i, or equivalently, the population means under sufficient sample size.

The estimator's consistency (convergence to the true Φ\Phi) holds under both independent and dependent emission processes (e.g., AR(1) models). The Hausdorff distance between the empirical convex hull and the true polytope vanishes as sample size grows.

3. Mathematical Properties and Scale Invariance

Unlike classical NMF, which is only defined up to rescaling by positive diagonal matrices, the Source Attribution Percentage Matrix is scale (and unit) invariant. For any positive diagonal scaling DD, WH=(WD)(D1H)WH = (WD)(D^{-1}H) leaves YY unchanged, yet changes WW and HH arbitrarily. In contrast, the ratio

ϕkj=μkHkjμHj\phi_{kj} = \frac{ \mu_k H_{kj} }{ \sum_\ell \mu_\ell H_{\ell j} }

remains unchanged—both numerator and denominator scale together—so the attribution percentages are stable and comparable even between studies with differing normalization or instrument calibrations.

4. Numerical Validation and Convergence

Simulation experiments, spanning both independent identically distributed and serially dependent WiW_i emissions, confirm the geometric estimator's performance:

  • Normalized root mean squared error (NRMSE) and Frobenius norm distances between true and estimated Φ\Phi matrices decrease with increasing sample size nn, supporting consistency.
  • Scatter plots of estimated vs. true ϕkj\phi_{kj} entries exhibit points tightly clustered near the diagonal for moderate-to-large nn.
  • The maximum volume polytope algorithm robustly locates source vertices in the empirical simplex, leading to reliable source profile recovery and, via averaging, accurate estimation of Φ\Phi.

5. Practical Application and Interpretability

Practitioners obtain an immediately interpretable matrix Φ\Phi quantifying the proportion of each observed feature (pollutant) attributable to each source. This representation:

  • Enables targeted mitigation policies by identifying dominant sources for each pollutant.
  • Facilitates cross-paper and cross-site comparability even when measurement units or experimental designs differ.
  • Avoids pitfalls associated with arbitrary sparsity or normalization assumptions in traditional NMF, increasing robustness to model specification.
  • Informs regulatory frameworks and reporting standards for source apportionment, since the matrix provides actionable, scale-free attribution percentages.

6. Broader Implications and Extensions

The population-level, geometry-based identification strategy transcends the limitations of interpretable factorization (where factors are not uniquely defined and often tangled with normalization choices). By focusing on convex hull geometry and probabilistic separability, the framework is applicable to a broad class of problems—not merely atmospheric pollution, but any scenario requiring attribution of multivariate measurements to latent sources via NMF. The geometric estimator requires minimal parametric assumptions and accommodates temporal or spatial dependence in source emissions, further increasing its real-world utility.

This approach formalizes a rigorous, interpretable, and robust methodology for source attribution analysis and defines a new standard for reporting and comparing source contributions in multi-feature observational studies.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)
Forward Email Streamline Icon: https://streamlinehq.com

Follow Topic

Get notified by email when new papers are published related to Source Attribution Percentage Matrix.