Papers
Topics
Authors
Recent
2000 character limit reached

Mouth Aspect Ratio (MAR)

Updated 7 December 2025
  • Mouth Aspect Ratio (MAR) is a geometric metric defined as the ratio of vertical inner lip distances to the horizontal mouth width, capturing mouth opening.
  • MAR is computed using facial landmark detection (via dlib's 68-point model), Euclidean distance calculations, and optional temporal smoothing for video analysis.
  • MAR is combined with the Eye Aspect Ratio (EAR) in machine learning classifiers like XGBoost to reliably detect fatigue and yawn events in real-time.

The Mouth Aspect Ratio (MAR) is a geometric metric that quantifies mouth opening by measuring specific distances between defined inner lip landmarks in a two-dimensional facial image. MAR serves as a key feature in facial analysis applications such as fatigue and yawn detection, being combined with other indicators—most notably the Eye Aspect Ratio (EAR)—to enable robust real-time recognition of states related to drowsiness and alertness. MAR is employed as a principal input to supervised classifiers such as XGBoost ensembles for binary classification of fatigue from facial images, as demonstrated in contemporary machine learning models (Chen et al., 2023).

1. Mathematical Definition of MAR

Let P1,…,P6∈R2P_1,\ldots,P_6 \in \mathbb{R}^2 denote the ordered two-dimensional coordinates of six inner lip landmarks. The MAR is the ratio of the average vertical inner mouth opening to the horizontal mouth width, calculated as

MAR=∥P2−P6∥2+∥P3−P5∥22 ∥P1−P4∥2,\mathrm{MAR} = \frac{ \|P_2 - P_6\|_2 + \|P_3 - P_5\|_2 } { 2\,\|P_1 - P_4\|_2 },

where for points Pi=(xi,yi)P_i=(x_i,y_i) and Pj=(xj,yj)P_j=(x_j,y_j),

∥Pi−Pj∥2=(xi−xj)2+(yi−yj)2.\|P_i - P_j\|_2 = \sqrt{(x_i - x_j)^2 + (y_i - y_j)^2 }.

The specific assignments are:

  • P1P_1, %%%%6%%%% — left/right mouth corners (horizontal span),
  • P2P_2, P6P_6 and P3P_3, P5P_5 — paired vertical landmarks capturing mouth opening.

2. Landmark Acquisition and MAR Computation

Computation of MAR from facial images proceeds through the following steps:

  1. Image Preparation: Each acquired frame (static or video) is converted to grayscale, typically via OpenCV routines, enhancing feature detector robustness.
  2. Facial Landmark Detection: dlib’s 68-point shape predictor is applied to the grayscale image, producing an array of (x,y)(x, y) coordinates for facial landmarks.
  3. Landmark Indexing: The six MAR-relevant inner lip points are mapped to corresponding dlib indices. For practical assignments:
    • Mouth corners (P1P_1, P4P_4): dlib points 49 and 55,
    • Vertical pairs: (P2P_2, P6P_6): points 52 and 58, (P3P_3, P5P_5): points 53 and 57.
  4. Metric Calculation: Euclidean distances are computed for the identified landmark pairs as per MAR definition; MAR is then assembled from these measurements.
  5. Thresholding (Yawn/Fatigue Detection): A typical threshold, e.g., MAR > 0.5 in a single frame, is used for flagging wide mouth opening. For continuous video, either frame counting above threshold or aggregation (such as a rolling mean) is employed.

3. Time-Series Processing and Smoothing

For static images, no explicit temporal smoothing is applied to the MAR values. In video-based scenarios, optional temporal smoothing (e.g., simple moving average, low-pass filter) can be introduced to mitigate high-frequency noise and frame-to-frame variability:

MARt‾=1N∑k=t−N+1tMARk\overline{\text{MAR}_t} = \frac{1}{N} \sum_{k=t-N+1}^t \text{MAR}_k

Such smoothing increases the stability of features before classification, especially prior to thresholding or for the generation of time-series derived fatigue indicators.

4. Integration with XGBoost Fatigue Recognition

The primary supervised classifier utilizing MAR is an XGBoost ensemble, operating in binary:logistic mode. MAR is paired with EAR to form a two-dimensional feature vector per sample:

  • Feature vector: [EAR,MAR][\mathrm{EAR}, \mathrm{MAR}]
  • Model configuration:
    • Number of trees: 2000
    • Maximum depth: 6
    • Loss: Second-order Taylor expansion with regularization
  • Training/test split: 70%/30%

The dual-predictor vector, without explicit normalization or scaling, is input to the trained classifier. Combined system performance achieves 87.37% accuracy and 89.14% sensitivity on the fatigue-vs-non-fatigue classification benchmark. The individual contribution of MAR is not quantifiably isolated, but both features are observed to act as equally informative predictors in the model (Chen et al., 2023).

5. Implementation Workflow and Pseudocode

The following pseudocode summarizes the operational pipeline for MAR extraction and preparation of samples to be classified by XGBoost:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
initialize mar_list = []
for each frame in video_stream:
    gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)
    faces = face_detector(gray)                           # dlib detector
    for face in faces:
        shape = shape_predictor(gray, face)               # dlib 68 points
        coords = shape_to_numpy(shape)                    # Nx2 array
        # map paper’s P1…P6 to actual indices:
        P1 = coords[idx_corner_left]
        P4 = coords[idx_corner_right]
        P2 = coords[idx_upper_mid]
        P6 = coords[idx_lower_mid]
        P3 = coords[idx_upper_adjacent]
        P5 = coords[idx_lower_adjacent]
        # compute Euclidean distances
        v1 = norm(P2 - P6) + norm(P3 - P5)
        h  = norm(P1 - P4)
        mar = v1 / (2 * h)
        mar_list.append(mar)

After extraction, each feature vector [EAR,MAR][\mathrm{EAR}, \mathrm{MAR}] is optionally normalized, then passed into the trained XGBoost model for prediction.

6. Thresholding, Labeling, and Practical Considerations

A single-frame MAR threshold (e.g., 0.5) enables detection of physiologically significant events such as yawning. For temporal sequences, event detection uses frame count above threshold, or the rolling mean of MAR values cross-referenced against the predefined threshold. No ablation isolating MAR's sole detection efficacy is provided, but it is designated a core, non-redundant contributor alongside EAR.

A plausible implication is that MAR, while intuitively interpretable as a robust facial movement descriptor, requires accurate landmark localization and consistent index mapping to deliver reliable results, particularly under challenging imaging conditions.

7. Summary Table: MAR Workflow Components

Step Tool/Method Notes
Face detection dlib shape predictor 68-point model
Landmark mapping dlib indices Points 49–55, 52/58, 53/57
Distance calculation Euclidean norm As per explicit formula
MAR computation Formulaic ratio Vertical/horizontal pairings
Classifier input XGBoost (EAR, MAR) No explicit scaling applied

This workflow enables MAR-driven automated fatigue and yawn detection with high classification performance, contributing alongside EAR to binary facial state recognition pipelines (Chen et al., 2023).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Whiteboard

Follow Topic

Get notified by email when new papers are published related to Mouth Aspect Ratio (MAR).