Papers
Topics
Authors
Recent
Search
2000 character limit reached

Maximum Correntropy Criterion

Updated 10 April 2026
  • Maximum correntropy criterion is an information-theoretic method that uses a Gaussian kernel to down-weight outliers and impulsive noise.
  • It is widely applied in adaptive filtering, compressive sensing, and system identification to achieve improved convergence and robustness.
  • Its implementation involves iterative reweighted optimization and careful kernel bandwidth selection to balance sensitivity and stability.

The maximum correntropy criterion (MCC) is a robust, information-theoretic alternative to traditional quadratic optimality criteria such as mean square error (MSE), extensively utilized in signal processing, adaptive filtering, robust statistics, compressive sensing, learning systems, and control theory. MCC replaces the quadratic penalty on errors with a local Gaussian kernel measure, yielding resistance to impulsive noise and outliers, enhanced robustness in heavy-tailed environments, and improved convergence in a wide range of practical applications.

1. Definition and Mathematical Formulation

Correntropy is a similarity measure between two scalar random variables XX and YY, defined as the expectation of a positive-definite kernel evaluated at XYX - Y. The most prevalent choice is the Gaussian kernel: Vσ(X,Y)=E[κσ(X,Y)]=E[exp((XY)22σ2)]V_\sigma(X, Y) = E\left[ \kappa_\sigma(X, Y) \right] = E\left[ \exp \left( -\frac{(X - Y)^2}{2\sigma^2} \right) \right] where σ>0\sigma > 0 denotes the kernel bandwidth. In empirical settings with samples {(xi,yi)}i=1N\{(x_i, y_i)\}_{i=1}^N, the sample correntropy estimator is

V^σ(X,Y)=1Ni=1Nexp((xiyi)22σ2)\widehat{V}_\sigma(X, Y) = \frac{1}{N} \sum_{i=1}^N \exp \left( -\frac{(x_i - y_i)^2}{2\sigma^2} \right)

The maximum correntropy criterion seeks model parameters that maximize sample correntropy between predictions and observations. In regression,

JMCC(θ)=1Ni=1Nexp(ei(θ)22σ2),ei(θ)=yifθ(xi)J_{\mathrm{MCC}}(\theta) = \frac{1}{N} \sum_{i=1}^N \exp\left( -\frac{e_i(\theta)^2}{2\sigma^2} \right), \quad e_i(\theta) = y_i - f_\theta(x_i)

This is typically recast as minimizing the correntropy-induced loss: σ(e)=1exp(e22σ2)\ell_\sigma(e) = 1 - \exp\left(-\frac{e^2}{2\sigma^2}\right) which is a bounded, redescending loss function (Welsch MM-estimator).

The kernel parameter YY0 calibrates the trade-off between sensitivity and robustness: as YY1, MCC reduces to MSE; as YY2, the loss becomes akin to zero-one loss, aggressively disregarding large deviations (Chen et al., 2015, Chen et al., 2017, Jing et al., 2021).

2. Robustness Properties and Theoretical Analysis

MCC exhibits fundamental robustness due to the exponential decay of the Gaussian kernel. Outliers, i.e., errors YY3, receive exponentially vanishing weight and exert negligible influence on the estimator. Thus, MCC is highly robust with respect to impulsive and heavy-tailed noise, outperforming quadratic or linear losses in the presence of outliers (Chen et al., 2017).

Formal robustness guarantees have been established: in errors-in-variables (EIV) models with both input and output outliers, the MCC solution remains within a bounded interval of the true parameter YY4 provided that a majority of samples are “good" and the kernel width YY5 is appropriately chosen: YY6 where YY7, YY8 are noise bounds and YY9 decreases as the fraction of inliers grows (Chen et al., 2017). MCC can thus tolerate arbitrarily large outliers if a sufficient inlier majority exists.

Statistical learning theory for MCC regression (MCCR) demonstrates that with scale parameter XYX - Y0, the estimator achieves the optimal learning rate XYX - Y1, outperforming the Huber loss and least squares in regimes contaminated by outliers (Jing et al., 2021). Empirical results confirm these theoretical predictions.

3. Algorithmic Integration and Optimization Schemes

Adaptive Filtering and Sparsity

MCC is commonly integrated into adaptive filtering as a robust alternative to LMS or RLS: XYX - Y2 where XYX - Y3 and XYX - Y4 the step size (He et al., 2017, Peng et al., 2016).

For sparsity-inducing applications (e.g., compressive sensing, channel estimation), XYX - Y5 or XYX - Y6 regularization, or the correntropy-induced metric (CIM), is added: XYX - Y7 Proximal, zero-attraction, or reweighted penalty techniques are utilized for non-differentiable terms (Ma et al., 2015, He et al., 2017).

Distributed and Recursive Estimation

Diffusion MCC algorithms for distributed networks maintain the same per-iteration complexity as standard LMS, with robustness to impulsive noise and mean/mean-square convergence guaranteed under standard step-size conditions (Ma et al., 2015). Recursive MCC (RMCC) and its sparsity-aware extensions utilize gain matrices and forgetting factors for rapid adaptation in nonstationary or sparse settings (Qin et al., 2022).

Kalman and Nonlinear Filtering

MCC has been embedded in the Kalman filtering framework (MCC-KF, MCKF) by replacing the classical MMSE update with a fixed-point or contraction mapping induced by the MCC loss: XYX - Y8 where the scalar XYX - Y9 is computed via Gaussian kernel weights on the innovation terms (Chen et al., 2015, Kulikova, 2023). Square-root implementations (Cholesky, UD, SVD-based) further enhance numerical robustness in high-reliability settings (Kulikova, 2023).

Half-Quadratic and Reweighted Solvers

Optimization of MCC objectives, being non-quadratic, often employs half-quadratic (HQ) or iterative reweighted least-squares approaches. At each iteration:

  1. Compute residuals and weights Vσ(X,Y)=E[κσ(X,Y)]=E[exp((XY)22σ2)]V_\sigma(X, Y) = E\left[ \kappa_\sigma(X, Y) \right] = E\left[ \exp \left( -\frac{(X - Y)^2}{2\sigma^2} \right) \right]0.
  2. Solve a weighted least-squares (possibly with regularization or constraints).
  3. Alternate weight and parameter updates until convergence (He et al., 2019, Zou et al., 2016).

These methods are employed in robust regression, matrix completion, and broad learning systems (Zheng et al., 2019, He et al., 2019).

4. Applications

Robust Estimation and System Identification

MCC algorithms have demonstrated superior performance in system identification under impulsive, heavy-tailed, or phase noise. Constrained MCC filters (CMCC) offer robust alternatives for scenarios with linear constraints, achieving lower steady-state mean-square deviation (MSD) than constrained LMS or RLS under non-Gaussian noise (Peng et al., 2016, Ma et al., 2017).

Sparse Adaptive Filtering and Channel Estimation

Sparsity-promoting MCC algorithms (e.g., CIMMCC, Vσ(X,Y)=E[κσ(X,Y)]=E[exp((XY)22σ2)]V_\sigma(X, Y) = E\left[ \kappa_\sigma(X, Y) \right] = E\left[ \exp \left( -\frac{(X - Y)^2}{2\sigma^2} \right) \right]1-MCC) are state-of-the-art for robust sparse channel estimation in non-Gaussian environments, achieving faster tracking and lower steady-state MSD than MSE- or Vσ(X,Y)=E[κσ(X,Y)]=E[exp((XY)22σ2)]V_\sigma(X, Y) = E\left[ \kappa_\sigma(X, Y) \right] = E\left[ \exp \left( -\frac{(X - Y)^2}{2\sigma^2} \right) \right]2-norm-based methods (Ma et al., 2015, He et al., 2017).

Distributed Estimation and Beamforming

Diffusion MCC and recursive MCC algorithms provide robust alternatives in sensor networks or beamforming, particularly in the presence of Vσ(X,Y)=E[κσ(X,Y)]=E[exp((XY)22σ2)]V_\sigma(X, Y) = E\left[ \kappa_\sigma(X, Y) \right] = E\left[ \exp \left( -\frac{(X - Y)^2}{2\sigma^2} \right) \right]3-stable or impulsive disturbance (Ma et al., 2015, Lu et al., 2016). The kernel width Vσ(X,Y)=E[κσ(X,Y)]=E[exp((XY)22σ2)]V_\sigma(X, Y) = E\left[ \kappa_\sigma(X, Y) \right] = E\left[ \exp \left( -\frac{(X - Y)^2}{2\sigma^2} \right) \right]4 is a critical tuning parameter, dictating the trade-off between adaptation speed and robustness.

Robust Machine Learning and Value Decomposition

MCC has been applied to deep reinforcement learning for robust value decomposition (MCVD), replacing fixed or ad-hoc weighting in TD-error loss with the dynamic, error-adaptive MCC weight, yielding robust performance across non-monotonic, high-variance environments (Liu et al., 2022).

In broad learning systems (BLS) and incremental learning, MCC imparts robustness to outliers in regression/classification tasks. Incremental algorithms based on MCC leverage efficient matrix update schemes for rapid adaptation to new data or network expansion (Zheng et al., 2019).

Structured Recovery and Matrix Completion

MCC has been employed for robust matrix completion via HQ-splitting, achieving noise-insensitive low-rank matrix estimation with computational advantages over nuclear-norm or entropy-minimization methods (He et al., 2019).

Localization and Neurodynamic Optimization

MCC-based loss functions have been utilized in TOA/TDOA localization under NLOS conditions, with half-quadratic reformulation or projection-type neural network solvers enabling robust position estimation despite large outlier-induced biases (Xiong et al., 2020, Xiong et al., 2020).

5. Implementation, Kernel Bandwidth Selection, and Stability

The kernel width Vσ(X,Y)=E[κσ(X,Y)]=E[exp((XY)22σ2)]V_\sigma(X, Y) = E\left[ \kappa_\sigma(X, Y) \right] = E\left[ \exp \left( -\frac{(X - Y)^2}{2\sigma^2} \right) \right]5 is the pivotal hyperparameter in MCC-based algorithms, controlling the scale at which errors are judged significant. Large Vσ(X,Y)=E[κσ(X,Y)]=E[exp((XY)22σ2)]V_\sigma(X, Y) = E\left[ \kappa_\sigma(X, Y) \right] = E\left[ \exp \left( -\frac{(X - Y)^2}{2\sigma^2} \right) \right]6 degrades MCC to MSE, reducing robustness; small Vσ(X,Y)=E[κσ(X,Y)]=E[exp((XY)22σ2)]V_\sigma(X, Y) = E\left[ \kappa_\sigma(X, Y) \right] = E\left[ \exp \left( -\frac{(X - Y)^2}{2\sigma^2} \right) \right]7 aggressively rejects outliers but may reduce adaptation rate or effective sample size (Chen et al., 2015, Ma et al., 2015).

Several papers advocate annealing Vσ(X,Y)=E[κσ(X,Y)]=E[exp((XY)22σ2)]V_\sigma(X, Y) = E\left[ \kappa_\sigma(X, Y) \right] = E\left[ \exp \left( -\frac{(X - Y)^2}{2\sigma^2} \right) \right]8 (gradually reducing over iterations) or setting Vσ(X,Y)=E[κσ(X,Y)]=E[exp((XY)22σ2)]V_\sigma(X, Y) = E\left[ \kappa_\sigma(X, Y) \right] = E\left[ \exp \left( -\frac{(X - Y)^2}{2\sigma^2} \right) \right]9 to a fraction of the error residual's standard deviation (e.g., Silverman’s rule-of-thumb). Adaptive selection strategies (e.g., as a function of error quantiles) can expedite convergence and enhance robustness in matrix completion and signal reconstruction (He et al., 2019, Zou et al., 2016).

Stability of MCC algorithms (mean or mean-square sense) is typically ensured under step-size or gain matrix parameter bounds analogous to their MSE-based counterparts, with explicit formulas derived for various architectures (Peng et al., 2016, Qin et al., 2022, He et al., 2017). The fixed-point mapping arising in Kalman and reweighted MCC solvers enjoys geometric convergence under contraction conditions on the derivative (Chen et al., 2015, Kulikova, 2023).

6. Extensions, Variants, and Practical Considerations

Recent work generalizes MCC to allow for a nonzero kernel center (MCC-VC) to account for bias in the error distribution, yielding improved robustness in nonzero-mean or skewed noise scenarios. The variable center and bandwidth are jointly optimized by alternating minimization or matching the kernel to the empirical error PDF (Chen et al., 2019).

MCC has also been used in value decomposition for reinforcement learning, robustifying target TD-losses in multi-agent or non-monotonic reward settings (Liu et al., 2022). In system identification with noisy inputs, bias-compensated extensions of MCC restore unbiasedness while preserving robustness to output outliers (Ma et al., 2017).

7. Summary Table: MCC Loss and Algorithmic Integration

Domain / Task MCC-based Objective Key Robustness/Algorithmic Modification Cited Papers
Adaptive filtering σ>0\sigma > 00 Kernel-weighted updates (Peng et al., 2016, Lu et al., 2016)
Sparse estimation/CS MCC + σ>0\sigma > 01 or CIM penalty Mini-batch, zero-attraction, half-quadratic splitting (Ma et al., 2015, He et al., 2017)
Distributed estimation Local MCC cost in diffusion framework Block-combine/Adapt structure (Ma et al., 2015)
Kalman filtering Gaussian kernel on innovation Fixed-point update, gain-scaling (σ>0\sigma > 02) (Chen et al., 2015, Kulikova, 2023)
Matrix completion MCC loss over observed entries HQ-optimization, adaptive bandwidth selection (He et al., 2019)
Machine learning/regression Empirical MCC, correntropy loss Iterative reweighting, fixed-point, broad/incremental learning (Zheng et al., 2019, Jing et al., 2021)
Robust localization MCC loss on residuals, HQ AM/GTRS/MSNN NLOS outlier rejection via influence saturation (Xiong et al., 2020, Xiong et al., 2020)

This table summarizes the core optimization targets and distinguishing algorithmic features of MCC-related methodologies in various signal processing and learning domains.


In summary, the maximum correntropy criterion constitutes a robust, kernel-based generalization of classical quadratic estimators, yielding statistically principled and computationally efficient solutions in the presence of impulsive, heavy-tailed, and outlier-contaminated data. MCC provides a flexible, theoretically grounded framework for robust estimation, learning, and control, with broad applicability in high-impact engineering and machine learning contexts (Chen et al., 2015, Chen et al., 2017, He et al., 2017, He et al., 2019, Jing et al., 2021, Kulikova, 2023).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (18)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Maximum correntropy criterion.