MeniMV: Horn-Specific Meniscal Tear Benchmark

Updated 2 June 2026

MeniMV is a multi-view, horn-specific dataset featuring paired sagittal and coronal MRI images annotated with a four-tier severity scale.
The dataset comprises 3,000 images from 750 patients collected across three centers using standardized T2WI-FS protocols for consistency.
Benchmark evaluations show that pretrained transformer models, notably Swin-UNETR, outperform traditional CNNs in precise meniscal tear grading.

MeniMV is a multi-view, horn-specific benchmark dataset introduced to advance automated severity grading of meniscal horn tears using MRI exams. It was constructed to address the limitations of previous datasets, which primarily rely on coarse or binary study-level labels and lack precise localization and severity gradation, thereby constraining algorithmic development for clinically relevant meniscus injury analysis. MeniMV uniquely provides paired sagittal and coronal MRI images for both anterior and posterior meniscal horns, annotated with a four-tier severity scale, establishing a new standard for musculoskeletal imaging research and benchmarking (Xu et al., 20 Dec 2025).

1. Dataset Design and Construction

MeniMV was retrospectively composed of 3,000 horn-specific MRI images collected from 750 patients across three medical centers, with uniform acquisition via fat-suppressed T2-weighted sequences (T2WI-FS; sensitivity ≃ 94%). Each patient contributed two anatomically paired sagittal–coronal slice pairs (anterior and posterior horns), totaling 1,500 slice-pairs (3,000 images). Imaging parameters (3 mm slice thickness, 256×256 matrix) were standardized across institutions.

Co-registration between sagittal and coronal planes employed a rigid transformation, with $x' = R x + t$ , where $R \in \mathbb{R}^{3\times3}$ is the rotation matrix for anatomical alignment, and $t \in \mathbb{R}^3$ the translation vector.

Annotation involved six orthopedic clinicians (>10 years' experience) independently reviewing each exam; each horn's most diagnostic sagittal–coronal pair was selected and graded on the Stoller 0–III scale (0 = normal, 3 = severe). Annotations were double-validated by chief orthopedic physicians to ensure consensus. This design enables precise, horn-level, multi-view injury characterization absent in prior resources (Xu et al., 20 Dec 2025).

2. Population Demographics and Injury Grade Distribution

MeniMV’s patient cohort had broad demographic coverage (405 female, 345 male; age 14–82, mean 55.6 ± 12.7 years). The overall grade distribution across both meniscal horns was:

Grade	Count	Percentage (%)
0	1331	44.4
1	502	16.7
2	306	10.2
3	861	28.7

Analysis revealed grade 0 prevalence among younger cohorts, while grade 3 sharply increases after age 50, peaking in the 60–70 group. Mean ages by sex were 56.2 (±12.2) for males and 55.1 (±13.1) for females, indicating representative age and sex distributions (Xu et al., 20 Dec 2025).

3. Benchmark Evaluation: Architectures and Training Protocol

Severity grading was posed as a multi-view classification problem: $f_\theta(\{x_\text{sag}, x_\text{cor}\}, p) \to \{0,1,2,3\}$ . Three architectural categories were benchmarked:

Generic CNNs (trained from scratch): ResNet-50, ResNeXt-50 (32×4d), EfficientNet-B0, ShuffleNet-v2, MobileNet-v2, ConvNeXt-T, DenseNet-121, ViT-B/16.
Domain-specific architectures: MRNet, 2.5D ResNet Fusion, DeepKnee (adapted).
Modern pretrained backbones: ConvNeXt-V2-B, Swin-T, Swin-B, ViT-B/16 (MAE/DINO pretrained), Swin-UNETR encoder.

Training employed a hybrid objective: $L_\text{total} = \alpha L_\text{focal} + \beta L_\text{MSE}$ , where the focal loss ( $L_\text{focal}$ ) handled label imbalance, and the MSE term ( $L_\text{MSE}$ ) encouraged consistency between sagittal and coronal embeddings. Baseline cross-entropy ( $L_\text{CE}$ ) was used for comparison. Optimization used Adam (learning rate 1e–4, weight decay 1e–5), with 32 slice-pair batches and early stopping on validation Macro-F1. Data augmentations included random rotations (±15°), horizontal flip, and intensity jittering (Xu et al., 20 Dec 2025).

4. Experimental Results and Quantitative Benchmarks

Evaluation metrics comprised Accuracy (Acc), Macro-F1, and Mean Absolute Error (MAE), computed per meniscal horn.

Backbone Performance

Backbone	Acc (%)	Macro-F1	MAE
ResNet-50	57.67	0.4667	0.91
DenseNet-121	67.83	0.5730	0.74
ViT-B/16 (scratch)	50.33	0.2907	1.10
MRNet (adapted)	69.45	0.5912	0.68
2.5D ResNet Fusion	71.80	0.6150	0.63
DeepKnee (adapted)	74.22	0.6385	0.58
Swin-B (pretrained)	76.38	0.6730	0.52
Swin-UNETR (pt)	76.92	0.6790	0.51

Pretrained Transformer architectures (notably Swin-UNETR) consistently achieved the highest performance (76.9% accuracy, 0.679 Macro-F1, MAE 0.51), outperforming both generic and domain-specific CNN designs. Most misclassifications occurred between adjacent grades (especially grades 1 and 2) and at posterior horns, indicating persistent challenges in detecting subtle or early-stage tears (Xu et al., 20 Dec 2025).

Multi-View Fusion Strategies

Experimentation with DenseNet-121 as a shared encoder evaluated three fusion strategies:

Fusion Strategy	Accuracy (%)	Macro-F1
Additive	70.02	0.6021
Attention-based	72.46	0.6228
Concatenation	73.63	0.6347

Concatenation-based fusion consistently delivered superior results, indicating the additive value of combining complementary anatomical information across imaging planes.

Robustness Analyses

Leave-one-center-out (LOCO) cross-center validation revealed that pretrained Transformer models generalize more robustly to scanner and protocol variations:

Model	Macro-F1 (avg)	Acc (avg %)	MAE (avg)
DeepKnee	0.625	72.5	0.60
Swin-UNETR	0.641	73.6	0.56

Demographic stratification using Swin-UNETR found consistent performance across genders and age groups (e.g., Macro-F1: male 0.648, female 0.666; MAE 0.53 and 0.49, respectively) (Xu et al., 20 Dec 2025).

5. Key Findings and Identified Challenges

The introduction of MeniMV demonstrated several notable discoveries:

Pretrained Transformers, particularly Swin-UNETR, outperformed convolutional models for multi-view meniscal grading.
Concatenation-based fusion yielded the best performance among fusion strategies, suggesting that preserving independent view features is beneficial.
The largest error rates were observed at grade boundaries (notably grades 1–2) and for posterior horn tear classification.
Cross-center validation confirmed that architectures with large-scale pretraining and self-attention are less sensitive to differences in scanner or acquisition protocol.
Mild lesions (Stoller grades 1–2) and small radial tears remained most susceptible to misclassification.

A plausible implication is that leveraging intra-study anatomical diversity and advanced feature fusion is crucial for high-fidelity automated grading (Xu et al., 20 Dec 2025).

6. Future Research Directions

Recommendations articulated in the benchmarking study for advancing the state of automated meniscal injury grading include:

Employing semi-supervised or self-supervised pretraining on full 3D MRI volumes to capture more contextual and unlabeled data.
Enhancing 2D–3D registration through deformable transformations to leverage volumetric anatomical information.
Integrating explicit lesion localization (region-of-interest detection, attention mapping) and multi-scale feature pyramids to improve detection of subtle or early-grade injuries.
Using ordinal-aware loss functions (e.g., label-distribution smoothing, ordinal regression) to respect the intrinsic order in meniscal severity grading.
Augmenting imaging analysis with clinical metadata such as patient history or biomechanical scores for comprehensive severity modeling.

By providing 3,000 horn-specific, multi-view, expert-annotated MRI images and a suite of rigorous baselines, MeniMV establishes a foundational resource for model development, robustness assessment, and the investigation of automated, clinically relevant musculoskeletal imaging (Xu et al., 20 Dec 2025).

Markdown Report Issue Upgrade to Chat

References (1)

MeniMV: A Multi-view Benchmark for Meniscus Injury Severity Grading (2025)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to MeniMV.