Quran Phonetic Script (QPS) Overview

Updated 3 September 2025

QPS is a specialized two-layer phonetic formalism that encodes both Arabic letters and critical articulatory features specific to Tajweed rules.
It underpins advanced computational assessments including ASR, mispronunciation detection, and rule-based evaluation with high accuracy.
QPS supports interactive educational platforms and precise benchmarking, bridging traditional recitation practices with modern digital analysis.

Quran Phonetic Script (QPS) denotes a specialized formalism for the phonetic representation, computational modeling, and assessment of Quranic recitation. QPS is uniquely tailored to the requirements of Tajweed—the domain-specific articulation rules formalized by classical Quranic recitation scholarship—and diverges substantially from mainstream systems such as the International Phonetic Alphabet (IPA) or Standard Arabic phonetizations. Modern research leverages QPS within automatic speech recognition (ASR), mispronunciation detection, diacritic restoration, educational platforms, and machine learning pipelines to quantify and correct recitation errors at both segmental and suprasegmental levels.

1. Historical Context and Evolution

Quranic orthography has evolved from an oral tradition, with early written artifacts serving as memory aids by superimposing diacritical marks atop the consonantal skeleton. This facilitated the emergence of elaborate systems encoding Tajweed rules, guiding assimilation, vowel lengthening (madd), and pausal processes directly within the script (Martínez, 16 May 2025). The Contemporary Quranic Orthography (CQO), especially as encoded in the Cairo Qur'an, exhibits a systematic deployment of these markings, making it a linchpin for comparative and algorithmic analysis of Quranic phonetic processes.

Digital strategies now model these rules computationally, using cascades of regular-expression rewriting for adding or removing Tajweed notations, enabling accurate mapping and cross-manuscript alignment (Martínez, 16 May 2025).

2. Formal Structure of QPS

QPS is a two-layer system integrating both phoneme and attribute (Sifa/Sifat) levels (Abdelfattah et al., 27 Aug 2025). The phoneme layer encodes Arabic letters, short and long vowels, and diacritic-specific features, including recitation-dependent phenomena such as Madd duration (encoded as strings like “IIII” for a four-beat Madd). The attribute layer represents 10 critical articulatory features: hams_or_jahr (voicing), shidda_or_rakhawa (tension), tafkheem_or_taqeeq (emphasis), itbaq, safeer, qalqla, tikraar, tafashie, istitala, and ghonna (nasalization), thus capturing both segmental and supersegmental phonetics necessary for canonical recitation (Abdelfattah et al., 27 Aug 2025).

The representation stands in contrast to phoneme sets optimized for Modern Standard Arabic (MSA), such as the 68-phoneme inventory used in MSA pronunciation benchmarks composed of vowels, consonants, diphones, and explicit gemination (Kheir et al., 9 Jun 2025).

3. QPS in Computational Assessment and ASR

Recent computational systems deploy QPS as an explicit output or target for ASR and mispronunciation detection, especially for recitation error quantification (Abdelfattah et al., 27 Aug 2025). Systems typically utilize deep feature encoders (e.g., wav2vec2-BERT), with segmentation at pause points (waqf) and transcript verification via specialized algorithms (e.g., Tasmeea). In multi-level CTC models, predictions for both the phoneme and Sifa layers are emitted, with per-level loss aggregation:

$L_\text{total} = 0.4 \cdot L_\text{CTC}(\text{phoneme}) + \sum_{i=1}^{10} L_\text{CTC}(\text{Sifa}_i)$

The average reported Phoneme Error Rate (PER) is 0.16%, attesting to the model's efficacy in distinguishing subtle Tajweed errors (Abdelfattah et al., 27 Aug 2025).

QPS-based ASR supports direct feedback and error localization at both the phonemic and articulatory characteristic level, which is not achievable with conventional IPA-based systems (Abdelfattah et al., 27 Aug 2025).

4. Integration with Machine Learning for Pronunciation Assessment

Paradigms for evaluating Tajweed recitation incorporate CNNs (EfficientNet-B0 + Squeeze-and-Excitation), RNNs (LSTM), and SVM classifiers, all utilizing QPS or context-specific phonetic targets (Shaiakhmetov et al., 30 Mar 2025, Harere et al., 2023, Alagrami et al., 2020). Input audio is typically transformed into normalized mel-spectrograms or MFCC features, processed for time-frequency characterization critical for Tajweed rules.

Reported accuracies on QDAT-based Tajweed rule classification (Separate Stretching, Tight Noon, Hide) via EfficientNet-B0 reach 95.35–99.34%, while LSTM architectures similarly achieve 95–96% for rule-level detection (Shaiakhmetov et al., 30 Mar 2025, Harere et al., 2023). SVM-based pipelines with filter-bank features yield high correct prediction rates (~99%) for rule validation (Alagrami et al., 2020).

These approaches enable immediate, interactive feedback in educational systems, scaling access to effective Tajweed instruction.

5. Orthographic Mapping, Diacritization, and Phonetic Transcription

Orthographic-to-phonetic mapping in Qur'anic contexts—sometimes termed Phonetic Orthographical Transcription (POT)—relies on both lexicon-based and context-sensitive rule-based approaches (Hanane et al., 2014). Transformation rules encode relationships among left and right graphemic contexts, diacritical markers, and phonemic outcome:

$\text{Phoneme} = \{\text{LG}\} + \{\text{C}\} + \{\text{RD}\}$

For Qur'an, rule systems must be enriched to encode Tajweed-specific phenomena: Madd, Idgham, Ghunnah, etc., with detailed context markers and exception lexicons (Hanane et al., 2014). Such models accommodate intricate recitational cues—missing in Standard Arabic TTS systems or MBROLA SAMPA-coded pipelines.

Advanced dialectal sound and vowelization recovery, as demonstrated by the DSVR framework, employs vector quantization and self-supervised representation learning with transformer models, enhancing character error rate by ∼7% on the ArabVoice15 test set for dialect-rich phonetic transcription (Kheir et al., 5 Aug 2024).

6. Educational Applications and Benchmarking

QPS supports the development of rich, interactive educational platforms (e.g., QVoice) that provide learners—especially non-native speakers—with granular phonetic transcriptions and transliterations, real-time character and word-level feedback, and mitigation of dialectal interference via attention mechanisms (Kheir et al., 2023). LaTeX formalisms specify mapping:

$\Phi(\mathbf{A}) = \{\phi_1, \phi_2, \ldots, \phi_n\}$

and scoring:

$S = \frac{1}{n}\sum_{i=1}^n s_i$

QuranMB.v1 provides a unified benchmark containing systematically annotated mispronunciation patterns, leveraging a controlled phoneme inventory (68 phonemes) and rigorous evaluation metrics (F1-score, PER) for model comparison (Kheir et al., 9 Jun 2025). Baseline multilingual and monolingual models (Wav2vec2, HuBERT, WavLM, mHuBERT) demonstrate upper-bound F1-scores near 30%, highlighting ongoing challenges in fine-grained pronunciation assessment (Kheir et al., 9 Jun 2025).

7. Future Directions and Open Challenges

Open research areas include expanding annotated QPS datasets beyond the 300K utterances and 850+ hours now available (Abdelfattah et al., 27 Aug 2025), integrating transformer and multimodal models into ASR and educational pipelines, and formalizing Tajweed rule representation for broader recitation phenomena. Standardization across manuscripts and dialects remains challenging due to fine-grained variability and the sacred nature of the text; the use of CQO and the Cairo Qur'an as reference points supports textual alignment and comparative paper (Martínez, 16 May 2025).

Emergent frameworks for short vowel restoration and dialectal sound recognition—combining self-supervised embeddings with discrete codebooks—offer new tools for precision in Quran Phonetic Script development, with potential to address both phonetic diversity and recitational fidelity (Kheir et al., 5 Aug 2024).

In summary, Quran Phonetic Script is a domain-specific formalism employed in automated recitation assessment, computational orthography, educational technology, and manuscript comparison. Its structure, encoding both segments and articulatory features, and integration into advanced machine learning, underpins the state-of-the-art in Quranic phonetic modeling, with persistent challenges in data coverage, dialectal complexity, and rule formalization driving current research.