Papers
Topics
Authors
Recent
2000 character limit reached

AraBART+Morph+GEC for Arabic Grammatical Correction

Updated 25 November 2025
  • The paper introduces AraBART+Morph+GEC, which combines a BART-based encoder-decoder model with detailed morphological embeddings and a GED objective to improve Arabic error correction.
  • It employs a refined edit selection pipeline using logistic regression, agreement boosting, and non-maximum suppression, achieving up to 84.64% F₀.₅ on QALB-15 benchmarks.
  • Serving as a key component of the ArbESC+ ensemble, the system leverages both neural and linguistic features to set a new state-of-the-art for Arabic grammatical error correction.

AraBART+Morph+GEC is an Arabic grammatical error correction (GEC) system integrating a BART-based sequence-to-sequence architecture, explicit morphological analysis, and a parallel grammatical error detection (GED) objective. Developed as a key component of the ArbESC+ multi-system edit selection framework, it leverages both neural and linguistic features to address the challenges of morphologically rich and syntactically complex Arabic text. The system combines span-based edit proposals from independently trained variants, enabling fine-grained correction decisions within a larger ensemble strategy (Alrehili et al., 18 Nov 2025).

1. Architecture of AraBART+Morph+GEC

1.1 Base AraBART Backbone

AraBART employs the encoder–decoder “denoising” transformer originally proposed by Lewis et al. (2019), re-pretrained on extensive Arabic corpora as described in Antoun et al. (2020). Arabic-specific modifications include a BPE vocabulary (approximately 42,000 tokens), script-level adjustments for right-to-left text, and orthographic normalization. Pretraining objectives follow standard BART, encompassing masked token infilling, masked span infilling, and next-sentence prediction, all adapted for Arabic data.

1.2 Morphological Feature Integration

Morphological information is introduced via CAMeL Tools’ MADA+ analyzer. For each input token position ii, discrete features are extracted:

  • POS tag tiTPOSt_i\in T^{\mathrm{POS}}
  • Stem sis_i
  • Root rir_i
  • Additional attributes fiFf_i\in F (number, gender, case, etc.)

These features are embedded as follows:

E(i)m=EPOS[ti]+Estem[si]+Eroot[ri]+fFEf[fi]E^m_{(i)} = E^{\mathrm{POS}}[t_i] + E^{\mathrm{stem}}[s_i] + E^{\mathrm{root}}[r_i] + \sum_{f\in F} E^f[f_i]

The encoder’s input vector at each position is

hi(0)=E(i)tok+Eipos+E(i)mh^{(0)}_i = E^{\text{tok}}_{(i)} + E^{\text{pos}}_i + E^m_{(i)}

Optionally, internal layers inject morphological embeddings into the multi-head self-attention keys and values,

Q=WQh(1)  ;  K=WKh(1)+WmEm  ;  V=WVh(1)+WmEmQ^\ell = W_Q h^{(\ell-1)} \; ; \; K^\ell = W_K h^{(\ell-1)} + W_m E^m \; ; \; V^\ell = W_V h^{(\ell-1)} + W'_m E^m

enabling direct incorporation of morphological cues in attention computations.

1.3 GEC-specific Multi-task Objectives

The model is trained for both text generation and error detection:

  • Sequence generation with standard cross-entropy loss:

Lseq=tlogp(yty<t,x)L_{\text{seq}} = -\sum_t \log p(y_t \mid y_{<t}, x)

  • Grammatical error detection (GED):

LGED=i[gilogσ(ui)+(1gi)log(1σ(ui))]L_{\text{GED}} = -\sum_i \left[ g_i \log \sigma(u_i) + (1-g_i)\log(1-\sigma(u_i)) \right]

where uiu_i is the GED logit for token ii, gig_i is its gold label, and σ\sigma is the sigmoid. The full objective is L=Lseq+λLGEDL = L_{\text{seq}} + \lambda L_{\text{GED}} with λ=1.0\lambda=1.0.

1.4 Training Regime and Hyperparameters

  • Data: QALB-2014, QALB-2015, ZAEBUC corpora (joint/separate variants)
  • Optimization: AdamW, learning rate 2×1052 \times 10^{-5}, weight decay 0.01
  • Batch size: 16, mixed precision (fp16)
  • Training epochs: 50, early stopping via development set
  • Inference: beam search, beam size 5, max output length 100

2. Generation and Featureization of Correction Proposals

2.1 Candidate Edit Extraction

At inference, three independently trained AraBART+Morph+GEC models (corresponding to QALB-14, QALB-15, and ZAEBUC domains) generate corrected sentences. Source-to-output alignments yield proposed span edits e=(a,b,r)e=(a,b,r), interpreted as replacements of source tokens [a..b1][a..b-1] by string rr.

2.2 Numerical Feature Representation

For each edit, multiple features are computed:

  • System confidence: For a proposal ee from system kk, the normalized probability mass assigned to edits containing ee across output beams,

ck(e)=bbeams1[eE(yb)]softmaxb(sb)c_k(e) = \sum_{b\in\text{beams}} \mathbf{1}[e\in E(y_b)] \cdot \mathrm{softmax}_b(s_b)

  • Morphological consistency: MC(e)[0,1]M_C(e)\in[0,1] measures alignment between the replacement rr and MADA+ predicted gold features:

MC(e)=1FfF1[f(predicted on r)=f(gold)]M_C(e) = \frac{1}{|F|}\sum_{f\in F}\mathbf{1}[f(\text{predicted on }r) = f(\text{gold})]

  • Span features: Size of the replaced span (ba)(b-a) and length of rr.

3. Edit Selection: Classifier and Decision Pipeline

3.1 Feature Vector Construction

A binary feature vector xe{0,1}K×Tx_e\in\{0,1\}^{K\times T} (where K=9K=9 is the number of systems, T=3T=3 is the number of edit types) specifies which system(s) proposed ee of each type (insertion, deletion, substitution). Optionally, real-valued meta-features (system confidence, morphological consistency, span length) are appended.

3.2 Logistic Regression Scoring

Each candidate ee receives a raw probability score via logistic regression,

praw(e)=σ(wxe+b)p_{\text{raw}}(e) = \sigma(w^\top x_e + b)

optimized with binary cross-entropy on labeled edits.

3.3 Agreement Boosting and Dual-Threshold Filtering

System agreement is quantified:

  • n(e)=k1[system k proposed e]n(e) = \sum_k \mathbf{1}[\text{system }k\text{ proposed }e]
  • Boost factor: boost(e)=min(1+β(n(e)1),c)\mathrm{boost}(e) = \min(1+\beta(n(e)-1),c)
  • Adjusted score: padj(e)=praw(e)boost(e)p_{\text{adj}}(e) = p_{\text{raw}}(e)\cdot\mathrm{boost}(e)

Candidate ee is accepted if praw(e)τp_{\text{raw}}(e)\geq \tau and padj(e)ατp_{\text{adj}}(e)\geq \alpha\tau, enforcing both raw confidence and agreement.

3.4 Non-Maximum Suppression for Conflict Resolution

Inter-edit overlap is measured by one-dimensional IoU:

IoU(ei,ej)=max(0,min(bi,bj)max(ai,aj))(biai)+(bjaj)max(0,min(bi,bj)max(ai,aj))\mathrm{IoU}(e_i,e_j) = \frac{\max(0,\min(b_i,b_j)-\max(a_i,a_j))}{(b_i-a_i)+(b_j-a_j)-\max(0,\min(b_i,b_j)-\max(a_i,a_j))}

A greedy non-maximum suppression (NMS) procedure selects highest padjp_{\text{adj}} edits while ensuring non-overlapping spans (threshold θ=0\theta=0), with at most one insertion per position.

4. System Combination in ArbESC+ Framework

4.1 Model Ensemble

The full ArbESC+ system integrates:

  • Four sequence-to-sequence GEC models: AraT5, ByT5, mT5, AraBART
  • Three AraBART+Morph+GEC models (trained on QALB-14, QALB-15, ZAEBUC)
  • Two text-editing models

This ensemble yields K=9K=9 candidate outputs per sentence.

4.2 Combination and Decision Pipeline

The ensemble workflow is as follows:

  1. Aggregate unique span edits from all $9$ systems.
  2. Encode features for each edit as described above.
  3. Score with logistic regression.
  4. Apply agreement boosting and dual-threshold filtering.
  5. Resolve conflicts via NMS.
  6. Sequentially apply surviving edits to the left-to-right source.

4.3 Rationale for Micro-edit Level Combination

Micro-edit based voting enables fine-grained error correction where edits, rather than whole sentences, are the central decision unit. This enables contributions from high-confidence system components even when they disagree on overall sentence structure. Thresholding and agreement-based boosting limit spurious or low-confidence edits, while NMS prevents conflicting alterations on overlapping spans.

5. Empirical Performance and Ablative Analyses

5.1 Comparative Results

Model QALB-14 QALB-15 L1 QALB-15 L2
AraBART+Morph+GEC (2014) 76.20% 78.85% 52.00%
AraBART+Morph+GEC (2015) 77.99% 77.97% 60.98%
AraBART+Morph+GEC (ZAEBUC) 77.85% 77.73% 60.79%
ArbESC+ (all 9 combined) 82.63% 84.64% 65.55%

ArbESC+ outperforms single models by 4–6 F₀.₅ points across all benchmarks, establishing new state-of-the-art performance for Arabic GEC.

5.2 System Combination vs. Baselines

Majority voting, weighted voting, minimum Bayesian risk (MBR), and standard ESC system combinations are all surpassed by ArbESC+ by 1–3 F₀.₅ points on each evaluation split.

5.3 Model Number and Impact

Ablation results show that using only the best 3–5 models achieves F₀.₅ scores of 80.71–80.77 on QALB-14, compared to 82.63 for the full 9-model ArbESC+ system. Including all 9 but without the selection combiner yields F₀.₅=80.78, indicating that the edit-level combination pipeline yields further gains.

5.4 Threshold Sensitivity

The dual-threshold filtering is sensitive: values of τ\tau below 0.5 admit too many low-quality edits and depress F₀.₅, whereas τ\tau above 0.9 sacrifices recall. Optimal values of τ0.7\tau\approx0.7–$0.8$ deliver the strongest results.

5.5 Effect of Morphological Features

AraBART+Morph+GEC’s explicit use of morphological embeddings and parallel GED objectives yields a ≈2 F₀.₅ point improvement over vanilla AraBART, confirming the value of linguistic feature integration for Arabic GEC model proposals.

6. Summary and Significance

AraBART+Morph+GEC augments the standard Arabic BART transformer with detailed morphological features and a grammatical error detection head, producing more accurate and linguistically informed error corrections. Serving as black-box proposal generators within ArbESC+, its outputs are processed via a classifier pipeline that integrates proposals from nine diverse systems, leverages model agreement, filters candidates based on calibrated confidence thresholds, and resolves conflicts via span-level NMS. With final F₀.₅ scores of 82.63%, 84.64%, and 65.55% on the QALB-14, QALB-15 L1, and QALB-15 L2 benchmarks, AraBART+Morph+GEC—especially within ArbESC+—sets a new performance baseline for Arabic grammatical error correction and exemplifies the impact of combining neural and morphological approaches (Alrehili et al., 18 Nov 2025).

Slide Deck Streamline Icon: https://streamlinehq.com

Whiteboard

Forward Email Streamline Icon: https://streamlinehq.com

Follow Topic

Get notified by email when new papers are published related to AraBART+Morph+GEC.