Log-Law for Translation Quality
- Log-law for translation quality is defined by two formulations: a sequential power law that decays translation accuracy with hops and a logarithmic error tolerance model that scales acceptable errors with text length.
- The sequential power law model uses parameters like the semantic-divergence exponent and RMSE metrics to quantify accuracy decay over multiple translation hops in diverse language chains.
- The logarithmic error tolerance model, grounded in psychophysical and cognitive theories, offers a superior alternative to linear models by accurately predicting acceptable error rates based on sample length.
The log-law for translation quality characterizes empirically validated, non-linear relationships between quantitative measures of translation performance, sample size, and process complexity. Two principal log-law formulations are currently prominent: (1) the power-law decay of accuracy under sequential machine translation hops and (2) the logarithmic growth of acceptable error tolerances with evaluation sample length. These laws provide predictive and psychophysically grounded models for translation system performance and evaluation, supporting both human and AI-driven processes (Sequeira et al., 2020, Gladkoff et al., 17 Nov 2025).
1. Log-Law Models for Translation Quality
There exist two rigorously defined log-law regimes in translation quality research:
- Sequential Multi-Hop Power Law: In sequential machine translation, accuracy decays as a power law with respect to the number of translation "hops":
where is the number of hops, is a normalization constant (), and is a semantic-divergence exponent dependent on the inter-language distance and chain diversity. The metric is the accumulated GLEU score relative to the source (Sequeira et al., 2020).
- Logarithmic Error Tolerance Law: The maximum acceptable translation error count for a segment of length words grows logarithmically:
with calibrated to tolerance points. This model is empirically validated with evaluation data from multiple large enterprises and aligns with psychophysical (Weber-Fechner) and cognitive (Cognitive Load Theory) bases for perceptual tolerance (Gladkoff et al., 17 Nov 2025).
2. Theoretical Foundations
The sequential power law (AEL) emerges from empirical observation of compound translation error in multi-step, multi-language neural machine translation (NMT) pipelines. Each translation hop introduces stochastic degradation, and the accumulated effect displays a power-law form over many hops. The exponent quantifies aggregate semantic divergence along a translation path and is higher for sequences mixing distant language families, but lower within closely related groups (Sequeira et al., 2020).
The logarithmic error-tolerance law is grounded in psychophysical and cognitive theories. The Weber–Fechner law quantifies diminishing subjective sensitivity to repeated stimuli (e.g., translation errors), motivating a 0 relationship. Cognitive Load Theory similarly predicts that the disruptive impact of incremental errors saturates sub-linearly with sample length, justifying the logarithmic scaling for penalty tolerance (Gladkoff et al., 17 Nov 2025).
3. Experimental Validation
Sequential Power Law:
The empirical studies of Sequeira et al. conducted with Google Translate (2019 API) used both literary texts (English and Portuguese excerpts) and translation chains of up to 284 hops. Multiple chain families were examined:
- 71-language "random" chains (diverse languages)
- 7-language "common" chains (closely related languages)
- 7-language "mixed" chains (distant language families)
Parameter estimation minimized RMSE between observed GLEU-accumulation and the power-law model. Results indicate:
- For 71-language random: 1 (typical RMSE 2)
- For 7-language common: 3 (RMSE 4)
- For 7-language mixed: 5 (RMSE 6)
The law holds robustly in the intermediate-hop regime (7). Small deviations occur at very low (8) and very high (9) hop counts but remain within typical error margins (Sequeira et al., 2020).
Logarithmic Error Law:
Tolerance calibration data from three major translation buyers revealed that acceptable minor-error allowances followed a logarithmic, not linear, curve:
- For Client 1 (pages 0): 1, with 2, RMSE 3.
- Linear models (e.g., 4) yield 5, substantially underperforming. Similar findings hold for Clients 2 and 3. Calibration leverages two (or more) empirical anchor points and solves for 6 via root-finding or constrained least squares, as summarized in the table below.
| Client | Model | 7 | RMSE |
|---|---|---|---|
| Client 1 | Logarithmic | 0.945 | 0.471 |
| Client 1 | Linear | 0.044 | 1.955 |
4. Calibration and Integration in Evaluation Frameworks
For translation quality scoring (e.g., MQM, CAT, LQA workflows), calibration proceeds as follows:
- Collect two tolerance pairs 8, 9.
- Solve 0 for 1, where 2.
- Compute 3.
- The scoring pipeline replaces static linear tolerances with the computed dynamic 4 using the above formula.
For scores outside the 5 fidelity window of the linear model, the log-law is required to avoid structured bias against short or long evaluation samples. The model integrates seamlessly with standard scorecards, requiring only a revised penalty threshold calculation, with all subsequent stages unchanged (Gladkoff et al., 17 Nov 2025).