OASIS-1 Kaggle MRI Dataset

Updated 8 February 2026

OASIS-1 Kaggle MRI dataset is a publicly available collection of standardized, high-resolution T1-weighted brain scans with comprehensive demographic and clinical labels.
It supports diverse applications including Alzheimer’s disease classification, severity staging, and neuroimage-based morphometric analysis through machine learning.
The dataset adheres to rigorous preprocessing protocols based on Marcus et al., enabling reproducible benchmarks and reliable cross-study comparisons.

The OASIS-1 Kaggle MRI dataset is a foundational, publicly available collection of structural brain MRI scans, accompanied by rich demographic and clinical labels. It is widely used to advance machine learning–based research on Alzheimer’s disease (AD) classification, severity staging, and neuroimage-based morphometric analysis. The dataset provides T1-weighted MRI volumes from 416 right-handed adult subjects with full Clinical Dementia Rating (CDR) scoring, with data standardized and distributed in NIfTI format and comprehensive metadata CSVs. Its preprocessing conventions, class definitions, and access protocols are tightly specified by the original Marcus et al. (2007) protocol and strictly followed or lightly adapted in subsequent neuroimaging ML literature.

1. Cohort Composition and Labeling

The OASIS-1 dataset comprises 416 right-handed adults, ages 18–96, with an approximately equal gender distribution. CDR-based dementia staging allows for fine-grained labeling, with the following subject split (Miller et al., 2014, Awate, 2019, Ahmed, 18 Dec 2025):

CDR Score	Subjects
0.0 (non-demented)	316
0.5 (very mild)	70
1.0 (mild)	28
2.0 (moderate)	2

AD severity can be binned as binary (CDR = 0 vs CDR > 0), three-way (control, very mild, mild/moderate), or four-class (control, very mild, mild, moderate), as implemented in leading ML studies (Lincoln et al., 20 May 2025, Ahmed, 18 Dec 2025, Ahmed, 1 Feb 2026). Each subject’s metadata includes age, sex, Mini-Mental State Examination (MMSE) score, eTIV, nWBV, and classification-ready CDR.

2. MRI Acquisition Protocols and Data Structure

All MRI scans in OASIS-1 are high-resolution, T1-weighted images obtained using a Siemens Vision 1.5 T scanner with an MPRAGE sequence. Acquisition parameters (from (Miller et al., 2014)):

Repetition time (TR): 9.7 ms
Echo time (TE): 4.0 ms
Inversion time (TI): 20 ms
Flip angle: 10°
Voxel dimensions: 1.0 × 1.0 × 1.25 mm³ (acquisition); final volumes resampled to 1 mm³ isotropic
Image matrix: 256 × 256 (acquisition); final volumes: 176 × 208 × 176 voxels

Each subject has 3–4 motion-corrected, averaged sessions per scan. Images are stored as NIfTI-1 (.nii.gz) volumes, one file per session, alongside a single CSV file containing subject/session identifiers and all relevant labels (Awate, 2019, Miller et al., 2014, Ahmed, 18 Dec 2025).

3. Preprocessing and Feature Extraction

Preprocessing conventions commonly include skull stripping (fMRIDC BET), bias-field correction, Talairach atlas registration, and intensity normalization, with OASIS-1 distributions on Kaggle already masked and segmented (Miller et al., 2014). Tissue segmentations (gray/white matter, CSF) are provided as separate NIfTI masks.

Processing steps vary by downstream pipeline:

Classical ML (Miller et al., 2014): Volumetric features (white matter, gray matter, CSF), brain symmetry measures, and principal component analysis (PCA) projections on masked slices yield compact 11-dimensional subject-wise features.
Deep learning pipelines (Awate, 2019, Lincoln et al., 20 May 2025, Ahmed, 18 Dec 2025, Ahmed, 1 Feb 2026): Models typically operate on 2D or 3D arrays resliced from the NIfTI files, with per-slice or per-volume normalization (to [0,1] or zero-mean/unit-variance). Augmentation can include spatial resampling, sharpening, and geometric transforms, although some recent state-of-the-art workflows (e.g., (Ahmed, 18 Dec 2025, Ahmed, 1 Feb 2026)) report no explicit augmentation, leveraging dataset curation, class balancing, and strong deep architectures.

Sample extraction protocols include selecting 40–60 central axial slices per volume (Lincoln et al., 20 May 2025), resizing to 128×128, 224×224, or 248×248 pixels, and RGB mapping (either by channel copying or pseudo-coloring) to allow the use of 2D CNN backbones or Vision Transformer architectures (Ahmed, 18 Dec 2025).

4. Machine Learning Applications and Model Benchmarks

OASIS-1 is the predominant benchmark for MRI-based AD classification in the open research community. Representative pipelines include:

Support Vector Machine (SVM) on derived volumetric+PCA features (Miller et al., 2014): Test accuracy up to 85.7% (10-fold CV), with precision and recall 68–74%.
CNN and Transfer Learning (Awate, 2019): BellCNN (custom multi-stage CNN) achieves >95% accuracy for binary AD vs control; transfer learning with Inception v3/MobileNet yields 81–86% accuracy on held-out test scans.
Hybrid Deep/Topological Models (Ahmed, 1 Feb 2026): A TDA-plus-DenseNet121 fusion reaches 99.93% accuracy and 100% AUC in four-class OASIS-1 classification, using persistent homology-derived Betti features fused with deep CNN embeddings.
Advanced Attention Networks (Lincoln et al., 20 May 2025): The XDementNET model, using multi-level residual, spatial, and grouped attention blocks, achieves 99.92% accuracy for four-class, 99.90% three-class, and 99.95% binary classification tasks. Evaluation is conducted on 80,000 extracted 2D slices.
Vision Transformers with Pseudo-Color (Ahmed, 18 Dec 2025): PseudoColorViT-Alz attains 99.79% accuracy and 100% AUC for four-way dementia staging, demonstrating SOTA performance over recent CNN and Siamese architectures.

Benchmarking protocols vary in random split ratios (e.g., 80:20, 90:10, 72:12.75:15 for train/val/test) and in handling of class imbalance via augmentation or slice selection. Macro-averaged precision, recall, F1-score, and confusion matrices are standard outcome measures. All leading deep models report negligible misclassifications, with errors primarily between adjacent stages (very mild vs mild).

5. Data Access, Organization, and Metadata

On Kaggle, OASIS-1 is distributed as a folder containing:

“oasis_cross-sectional.csv”: Demographics and CDR labels
NIfTI files: OAS1_XXXX_MR1_mprage.nii.gz (one per subject/session)
Segmentation files (optional): gray/white matter, CSF

Researchers link images to CDR scores by matching MRI ID in the CSV. The dataset is "analysis ready" for both volumetric and deep learning workflows; all core preprocessing (skull stripping, bias correction, registration) is completed (Miller et al., 2014, Ahmed, 18 Dec 2025).

6. Clinical, Scientific, and Algorithmic Significance

OASIS-1 provides a unique, well-characterized neuroimaging testbed for algorithm development in the context of Alzheimer’s disease, particularly:

Training, validation, and benchmarking of classical and deep ML models for AD detection, severity grading, and anomaly detection (Reddy, 2022, Miller et al., 2014, Awate, 2019, Ahmed, 18 Dec 2025, Ahmed, 1 Feb 2026, Lincoln et al., 20 May 2025)
Evaluation of domain adaptation, transfer learning, and explainable AI techniques for MRI-based diagnostics
Development and validation of novel architectures (e.g., hybrid TDA+CNN, attention-rich deep networks, Vision Transformers with advanced input mappings)
Direct comparison of new methods to a growing repository of published benchmarks (see Table 2 in (Ahmed, 18 Dec 2025) for summary statistics)

By virtue of its standardization, comprehensive metadata, and wide adoption, OASIS-1 enables precise cross-study comparison and reproducibility in AD-related classification and morphometric research.

7. Limitations and Protocol Considerations

All studies relying on OASIS-1 as distributed on Kaggle inherit its properties:

Age distribution is skewed toward older adults (many subjects >60 years).
Moderate and severe AD classes are underrepresented (only 2 subjects with CDR = 2), leading to class imbalance in multi-class scenarios (Ahmed, 1 Feb 2026, Ahmed, 18 Dec 2025).
2D slice-based learning protocols leverage extensive slice sampling per volume, but may not fully reflect subject-level diagnostic realities.
No indication of augmented or synthetic data is given in the original releases; reported augmentation strategies are pipeline dependent.

A plausible implication is that, while models tuned to OASIS-1 achieve extremely high reported accuracy, generalization to more heterogeneous or larger neuroimaging datasets may require careful consideration of class distribution, imaging protocol variance, and real-world prevalence.

Key References:

"Support vector machine classification of dimensionally reduced structural MRI images for dementia" (Miller et al., 2014)
"Detection of Alzheimers Disease from MRI using Convolutional Neural Networks, Exploring Transfer Learning And BellCNN" (Awate, 2019)
"Hybrid Topological and Deep Feature Fusion for Accurate MRI-Based Alzheimer's Disease Severity Classification" (Ahmed, 1 Feb 2026)
"XDementNET: An Explainable Attention Based Deep Convolutional Network to Detect Alzheimer Progression from MRI data" (Lincoln et al., 20 May 2025)
"Colormap-Enhanced Vision Transformers for MRI-Based Multiclass (4-Class) Alzheimer's Disease Classification" (Ahmed, 18 Dec 2025)
"Application of Unsupervised Domain Adaptation for Structural MRI Analysis" (Reddy, 2022)