Maximum Correntropy Criterion

Updated 10 April 2026

Maximum correntropy criterion is an information-theoretic method that uses a Gaussian kernel to down-weight outliers and impulsive noise.
It is widely applied in adaptive filtering, compressive sensing, and system identification to achieve improved convergence and robustness.
Its implementation involves iterative reweighted optimization and careful kernel bandwidth selection to balance sensitivity and stability.

The maximum correntropy criterion (MCC) is a robust, information-theoretic alternative to traditional quadratic optimality criteria such as mean square error (MSE), extensively utilized in signal processing, adaptive filtering, robust statistics, compressive sensing, learning systems, and control theory. MCC replaces the quadratic penalty on errors with a local Gaussian kernel measure, yielding resistance to impulsive noise and outliers, enhanced robustness in heavy-tailed environments, and improved convergence in a wide range of practical applications.

1. Definition and Mathematical Formulation

Correntropy is a similarity measure between two scalar random variables $X$ and $Y$ , defined as the expectation of a positive-definite kernel evaluated at $X - Y$ . The most prevalent choice is the Gaussian kernel: $V_\sigma(X, Y) = E\left[ \kappa_\sigma(X, Y) \right] = E\left[ \exp \left( -\frac{(X - Y)^2}{2\sigma^2} \right) \right]$ where $\sigma > 0$ denotes the kernel bandwidth. In empirical settings with samples $\{(x_i, y_i)\}_{i=1}^N$ , the sample correntropy estimator is

$\widehat{V}_\sigma(X, Y) = \frac{1}{N} \sum_{i=1}^N \exp \left( -\frac{(x_i - y_i)^2}{2\sigma^2} \right)$

The maximum correntropy criterion seeks model parameters that maximize sample correntropy between predictions and observations. In regression,

$J_{\mathrm{MCC}}(\theta) = \frac{1}{N} \sum_{i=1}^N \exp\left( -\frac{e_i(\theta)^2}{2\sigma^2} \right), \quad e_i(\theta) = y_i - f_\theta(x_i)$

This is typically recast as minimizing the correntropy-induced loss: $\ell_\sigma(e) = 1 - \exp\left(-\frac{e^2}{2\sigma^2}\right)$ which is a bounded, redescending loss function (Welsch $M$ -estimator).

The kernel parameter $Y$ 0 calibrates the trade-off between sensitivity and robustness: as $Y$ 1, MCC reduces to MSE; as $Y$ 2, the loss becomes akin to zero-one loss, aggressively disregarding large deviations (Chen et al., 2015, Chen et al., 2017, Jing et al., 2021).

2. Robustness Properties and Theoretical Analysis

MCC exhibits fundamental robustness due to the exponential decay of the Gaussian kernel. Outliers, i.e., errors $Y$ 3, receive exponentially vanishing weight and exert negligible influence on the estimator. Thus, MCC is highly robust with respect to impulsive and heavy-tailed noise, outperforming quadratic or linear losses in the presence of outliers (Chen et al., 2017).

Formal robustness guarantees have been established: in errors-in-variables (EIV) models with both input and output outliers, the MCC solution remains within a bounded interval of the true parameter $Y$ 4 provided that a majority of samples are “good" and the kernel width $Y$ 5 is appropriately chosen: $Y$ 6 where $Y$ 7, $Y$ 8 are noise bounds and $Y$ 9 decreases as the fraction of inliers grows (Chen et al., 2017). MCC can thus tolerate arbitrarily large outliers if a sufficient inlier majority exists.

Statistical learning theory for MCC regression (MCCR) demonstrates that with scale parameter $X - Y$ 0, the estimator achieves the optimal learning rate $X - Y$ 1, outperforming the Huber loss and least squares in regimes contaminated by outliers (Jing et al., 2021). Empirical results confirm these theoretical predictions.

3. Algorithmic Integration and Optimization Schemes

Adaptive Filtering and Sparsity

MCC is commonly integrated into adaptive filtering as a robust alternative to LMS or RLS: $X - Y$ 2 where $X - Y$ 3 and $X - Y$ 4 the step size (He et al., 2017, Peng et al., 2016).

For sparsity-inducing applications (e.g., compressive sensing, channel estimation), $X - Y$ 5 or $X - Y$ 6 regularization, or the correntropy-induced metric (CIM), is added: $X - Y$ 7 Proximal, zero-attraction, or reweighted penalty techniques are utilized for non-differentiable terms (Ma et al., 2015, He et al., 2017).

Distributed and Recursive Estimation

Diffusion MCC algorithms for distributed networks maintain the same per-iteration complexity as standard LMS, with robustness to impulsive noise and mean/mean-square convergence guaranteed under standard step-size conditions (Ma et al., 2015). Recursive MCC (RMCC) and its sparsity-aware extensions utilize gain matrices and forgetting factors for rapid adaptation in nonstationary or sparse settings (Qin et al., 2022).

Kalman and Nonlinear Filtering

MCC has been embedded in the Kalman filtering framework (MCC-KF, MCKF) by replacing the classical MMSE update with a fixed-point or contraction mapping induced by the MCC loss: $X - Y$ 8 where the scalar $X - Y$ 9 is computed via Gaussian kernel weights on the innovation terms (Chen et al., 2015, Kulikova, 2023). Square-root implementations (Cholesky, UD, SVD-based) further enhance numerical robustness in high-reliability settings (Kulikova, 2023).

Half-Quadratic and Reweighted Solvers

Optimization of MCC objectives, being non-quadratic, often employs half-quadratic (HQ) or iterative reweighted least-squares approaches. At each iteration:

Compute residuals and weights $V_\sigma(X, Y) = E\left[ \kappa_\sigma(X, Y) \right] = E\left[ \exp \left( -\frac{(X - Y)^2}{2\sigma^2} \right) \right]$ 0.
Solve a weighted least-squares (possibly with regularization or constraints).
Alternate weight and parameter updates until convergence (He et al., 2019, Zou et al., 2016).

These methods are employed in robust regression, matrix completion, and broad learning systems (Zheng et al., 2019, He et al., 2019).

4. Applications

Robust Estimation and System Identification

MCC algorithms have demonstrated superior performance in system identification under impulsive, heavy-tailed, or phase noise. Constrained MCC filters (CMCC) offer robust alternatives for scenarios with linear constraints, achieving lower steady-state mean-square deviation (MSD) than constrained LMS or RLS under non-Gaussian noise (Peng et al., 2016, Ma et al., 2017).

Sparse Adaptive Filtering and Channel Estimation

Sparsity-promoting MCC algorithms (e.g., CIMMCC, $V_\sigma(X, Y) = E\left[ \kappa_\sigma(X, Y) \right] = E\left[ \exp \left( -\frac{(X - Y)^2}{2\sigma^2} \right) \right]$ 1-MCC) are state-of-the-art for robust sparse channel estimation in non-Gaussian environments, achieving faster tracking and lower steady-state MSD than MSE- or $V_\sigma(X, Y) = E\left[ \kappa_\sigma(X, Y) \right] = E\left[ \exp \left( -\frac{(X - Y)^2}{2\sigma^2} \right) \right]$ 2-norm-based methods (Ma et al., 2015, He et al., 2017).

Distributed Estimation and Beamforming

Diffusion MCC and recursive MCC algorithms provide robust alternatives in sensor networks or beamforming, particularly in the presence of $V_\sigma(X, Y) = E\left[ \kappa_\sigma(X, Y) \right] = E\left[ \exp \left( -\frac{(X - Y)^2}{2\sigma^2} \right) \right]$ 3-stable or impulsive disturbance (Ma et al., 2015, Lu et al., 2016). The kernel width $V_\sigma(X, Y) = E\left[ \kappa_\sigma(X, Y) \right] = E\left[ \exp \left( -\frac{(X - Y)^2}{2\sigma^2} \right) \right]$ 4 is a critical tuning parameter, dictating the trade-off between adaptation speed and robustness.

Robust Machine Learning and Value Decomposition

MCC has been applied to deep reinforcement learning for robust value decomposition (MCVD), replacing fixed or ad-hoc weighting in TD-error loss with the dynamic, error-adaptive MCC weight, yielding robust performance across non-monotonic, high-variance environments (Liu et al., 2022).

In broad learning systems (BLS) and incremental learning, MCC imparts robustness to outliers in regression/classification tasks. Incremental algorithms based on MCC leverage efficient matrix update schemes for rapid adaptation to new data or network expansion (Zheng et al., 2019).

Structured Recovery and Matrix Completion

MCC has been employed for robust matrix completion via HQ-splitting, achieving noise-insensitive low-rank matrix estimation with computational advantages over nuclear-norm or entropy-minimization methods (He et al., 2019).

Localization and Neurodynamic Optimization

MCC-based loss functions have been utilized in TOA/TDOA localization under NLOS conditions, with half-quadratic reformulation or projection-type neural network solvers enabling robust position estimation despite large outlier-induced biases (Xiong et al., 2020, Xiong et al., 2020).

5. Implementation, Kernel Bandwidth Selection, and Stability

The kernel width $V_\sigma(X, Y) = E\left[ \kappa_\sigma(X, Y) \right] = E\left[ \exp \left( -\frac{(X - Y)^2}{2\sigma^2} \right) \right]$ 5 is the pivotal hyperparameter in MCC-based algorithms, controlling the scale at which errors are judged significant. Large $V_\sigma(X, Y) = E\left[ \kappa_\sigma(X, Y) \right] = E\left[ \exp \left( -\frac{(X - Y)^2}{2\sigma^2} \right) \right]$ 6 degrades MCC to MSE, reducing robustness; small $V_\sigma(X, Y) = E\left[ \kappa_\sigma(X, Y) \right] = E\left[ \exp \left( -\frac{(X - Y)^2}{2\sigma^2} \right) \right]$ 7 aggressively rejects outliers but may reduce adaptation rate or effective sample size (Chen et al., 2015, Ma et al., 2015).

Several papers advocate annealing $V_\sigma(X, Y) = E\left[ \kappa_\sigma(X, Y) \right] = E\left[ \exp \left( -\frac{(X - Y)^2}{2\sigma^2} \right) \right]$ 8 (gradually reducing over iterations) or setting $V_\sigma(X, Y) = E\left[ \kappa_\sigma(X, Y) \right] = E\left[ \exp \left( -\frac{(X - Y)^2}{2\sigma^2} \right) \right]$ 9 to a fraction of the error residual's standard deviation (e.g., Silverman’s rule-of-thumb). Adaptive selection strategies (e.g., as a function of error quantiles) can expedite convergence and enhance robustness in matrix completion and signal reconstruction (He et al., 2019, Zou et al., 2016).

Stability of MCC algorithms (mean or mean-square sense) is typically ensured under step-size or gain matrix parameter bounds analogous to their MSE-based counterparts, with explicit formulas derived for various architectures (Peng et al., 2016, Qin et al., 2022, He et al., 2017). The fixed-point mapping arising in Kalman and reweighted MCC solvers enjoys geometric convergence under contraction conditions on the derivative (Chen et al., 2015, Kulikova, 2023).

6. Extensions, Variants, and Practical Considerations

Recent work generalizes MCC to allow for a nonzero kernel center (MCC-VC) to account for bias in the error distribution, yielding improved robustness in nonzero-mean or skewed noise scenarios. The variable center and bandwidth are jointly optimized by alternating minimization or matching the kernel to the empirical error PDF (Chen et al., 2019).

MCC has also been used in value decomposition for reinforcement learning, robustifying target TD-losses in multi-agent or non-monotonic reward settings (Liu et al., 2022). In system identification with noisy inputs, bias-compensated extensions of MCC restore unbiasedness while preserving robustness to output outliers (Ma et al., 2017).

7. Summary Table: MCC Loss and Algorithmic Integration

Domain / Task	MCC-based Objective	Key Robustness/Algorithmic Modification	Cited Papers
Adaptive filtering	$\sigma > 0$ 0	Kernel-weighted updates	(Peng et al., 2016, Lu et al., 2016)
Sparse estimation/CS	MCC + $\sigma > 0$ 1 or CIM penalty	Mini-batch, zero-attraction, half-quadratic splitting	(Ma et al., 2015, He et al., 2017)
Distributed estimation	Local MCC cost in diffusion framework	Block-combine/Adapt structure	(Ma et al., 2015)
Kalman filtering	Gaussian kernel on innovation	Fixed-point update, gain-scaling ( $\sigma > 0$ 2)	(Chen et al., 2015, Kulikova, 2023)
Matrix completion	MCC loss over observed entries	HQ-optimization, adaptive bandwidth selection	(He et al., 2019)
Machine learning/regression	Empirical MCC, correntropy loss	Iterative reweighting, fixed-point, broad/incremental learning	(Zheng et al., 2019, Jing et al., 2021)
Robust localization	MCC loss on residuals, HQ AM/GTRS/MSNN	NLOS outlier rejection via influence saturation	(Xiong et al., 2020, Xiong et al., 2020)

This table summarizes the core optimization targets and distinguishing algorithmic features of MCC-related methodologies in various signal processing and learning domains.

In summary, the maximum correntropy criterion constitutes a robust, kernel-based generalization of classical quadratic estimators, yielding statistically principled and computationally efficient solutions in the presence of impulsive, heavy-tailed, and outlier-contaminated data. MCC provides a flexible, theoretically grounded framework for robust estimation, learning, and control, with broad applicability in high-impact engineering and machine learning contexts (Chen et al., 2015, Chen et al., 2017, He et al., 2017, He et al., 2019, Jing et al., 2021, Kulikova, 2023).