Swedish BioFINDER-2: Alzheimer’s Biomarker Study

Updated 22 July 2025

The Swedish BioFINDER-2 Study is a large-scale, prospective project that systematically identifies multidimensional biomarkers for early Alzheimer’s detection and subtyping.
It integrates omics, neuroimaging, and clinical data, achieving improved diagnostic accuracy (e.g., AUC increases from ~0.74 to ~0.79) and supporting personalized risk stratification.
Advanced machine learning and network-based analyses are employed to translate biomarker discoveries into scalable tools for clinical trials and targeted therapeutic strategies.

The Swedish BioFINDER-2 Study is a large-scale prospective research project focused on elucidating the molecular underpinnings, risk factors, and biomarker signatures of Alzheimer’s disease and related dementias, with a particular emphasis on early detection, individualized risk stratification, and population-level applicability. Although its design and aims draw from the evolving landscape of multi-modal biomarker and real-world data research, BioFINDER-2 represents a pivotal node in the international effort to connect omics, imaging, machine learning, and electronic health records for dementia research and clinical implementation.

1. Study Scope and Objectives

The principal objectives of the Swedish BioFINDER-2 Study encompass the systematic identification, validation, and integration of multidimensional biomarkers—genomic, metabolic, proteomic, imaging, and clinical—for the early diagnosis, prognosis, and subtyping of Alzheimer’s disease (AD) and related neurodegenerative disorders. The study’s design incorporates deeply phenotyped cohorts with biosamples (blood, CSF), neuroimaging (MRI, PET, including novel PET tracers), cognitive testing, and comprehensive documentation of lifestyle, environmental, and demographic variables. Emphasis is placed on building forward-compatible biobanks and data repositories amenable to advanced machine learning analyses and cross-cohort harmonization.

A key feature is the explicit goal of integrating blood-based biomarker discovery—leveraging the accessibility and scalability of plasma assays—alongside high-dimensional imaging and genetic data to provide both practical tools for clinical risk stratification and mechanistic insight into AD pathophysiology (Leeuw et al., 2017).

2. Biomarkers: Profiles, Signatures, and Networks

Blood-based metabolic profiling is a cornerstone, drawing upon methodologies such as those in "Blood-based metabolic signatures in Alzheimer’s disease" (Leeuw et al., 2017), which serves as a blueprint for BioFINDER-2’s metabolic research. Using nested linear models with full adjustment for confounders (age, sex, BMI, mean arterial pressure), distinct signatures emerge: a robust differential expression of 26 metabolites—primarily reduced amines (e.g., 2-aminodipic acid, tyrosine, methyldopa) and triglycerides (e.g., TG(51:3), TG(54:6), TG(56:8))—differentiates AD from controls. Notably, SM(d18:1/20:1) is elevated in AD. In addition, plasma signatures combined with conventional clinical predictors yield improved classification AUCs, indicating the added diagnostic utility of metabolic data (increasing AUC from ~0.74 to ~0.79 when combined).

Network-based regulatory analysis, employing targeted fused ridge estimation of precision matrices, reveals altered core biochemical network architecture in AD, with central hubs such as lyso-phosphatidic acid C18:2, glycylglycine, glutamine, and platelet activating factor C16:0. APOE ε4 status further stratifies network topology, with ε4-positive AD patients manifesting a more cohesive, amine-centered inner core, suggesting genotype-specific metabolic rewiring likely relevant for pathobiological subtyping and targeted therapy design.

3. Imaging, Machine Learning, and Risk Prediction

BioFINDER-2 incorporates multimodal brain imaging, including FDG-PET metabolic imaging and structural MRI. Machine learning approaches tailored to prognostic and diagnostic purposes are informed by models such as those described in (Popuri et al., 2017) and (Tam et al., 2021). In FDG-PET workflows, multi-scale ensemble classifiers are constructed on patch-wise standardized uptake value ratio (SUVR) features—extracted from gray matter subdivisions at multiple scales. The FDG-PET DAT Score (FPDS) is formalized as:

$\mathrm{FPDS} = \frac{1}{M \times F} \sum_{i=1}^{M \times F} p_i$

where $p_i$ is the output of the $i$ th classifier, $M$ is the number of feature spaces, and $F$ the number of subagged training sets. This ensemble achieves an AUC of ~0.78 for classifying DAT trajectory versus non-DAT trajectory and state-of-the-art performance in predicting MCI to DAT conversion (AUCs of 0.81, 0.80, and 0.77 for 2, 3, and 5-year conversion windows).

Predictive modeling for cognitive decline employs support vector machines trained on combined cognitive (MMSE, CDR-SB), MRI-derived regional gray matter volumes, demographic variables, and APOE ε4 carrier status (Tam et al., 2021). These models are validated by cross-validation and external datasets, achieving AUCs of 79% (early AD) and 71% (presymptomatic individuals), and informing participant enrichment strategies with the potential to reduce required clinical trial sample sizes by up to 51%, thus increasing efficiency and power for biomarker-driven intervention studies.

4. Integration of Real-World Data and Risk Factor Ontologies

BioFINDER-2 is influenced by large-scale, systematic reviews of AD risk and protective factors, exemplified by the extraction of 477 risk factors across 10 categories (genomic, disease, lifestyle, biomarker, medication, procedure, family history, environment, socioeconomic, demographics) (Chen et al., 3 Feb 2024). The integration of real-world data—structured (EHR codes, labs, prescriptions) and unstructured (clinical narratives)—is crucial for addressing heterogeneity and improving the generalizability of research findings. Structured EHRs reliably capture disease, medication, and basic biomarker information, while unstructured data offer access to lifestyle and environmental exposures.

Genomic data acquisition remains challenging, largely due to the low prevalence of standardized genetic testing and limitations in EHR storage formats. The systematic use of NLP to mine biomedical literature and clinical documents provides an avenue to continuously update risk factor databases and knowledge maps. BioFINDER-2 thus adopts interactive knowledge map frameworks (e.g., Neo4j) as enabling tools for hypothesis generation, cross-domain linking, and exploratory analytics.

5. Multimodal Biomarker Pipelines and Interpretability

The integration of MEG and MRI features achieves optimal classification performance for MCI and early AD detection (Ahmad et al., 9 Aug 2024). Using source-localization techniques (LCMV, eLORETA) and feature selection pipelines (GLMNET with LASSO penalty), combining uncorrected MEG with z-score-standardized MRI features yields improved accuracy (76.3%) and AUC (0.82), outperforming single-modality approaches. Sparse coefficient estimates provide directly interpretable biomarker signatures, identifying neuroanatomically plausible regions (frontal, temporal, hippocampus, entorhinal) and frequency-specific oscillatory disruptions in MEG. This approach highlights the value of multimodal and interpretable machine learning frameworks for translational biomarker research in BioFINDER-2.

6. Graph-Based and Systems-Level Biomarker Discovery

Recent methodologies prioritize the discovery of interconnected biomarker networks and their perturbations in AD. The BRAIN (Biomarker Representation, Analysis, and Interpretation Network) framework combines multiple machine learning models with SHAP-based (SHapley Additive exPlanations) feature importance aggregation and graph-theoretic visualization to extract comprehensive and interpretable biomarker panels (Khalid et al., 27 Nov 2024). Key elements include:

Ensemble model training (e.g., logistic regression, random forests, shallow MLPs) on blood-based biomarkers, with bootstrapped aggregation of SHAP importance.
Construction of correlation-weighted graphs, using a threshold $\alpha$ to prune weak links.
Identification of sub-network modules (for example, CA-19-9/Eotaxin-3/AgRP; FASL/Fibrinogen/MIF; THPO/IL‑12p40/TNF‑α), whose interactions differ between AD and control cohorts.

This network approach provides insights into biological pathway interdependencies and supports drug discovery by revealing actionable sub-networks. Such frameworks are directly transferable to BioFINDER-2’s biomarker datasets, enabling population-level, cost-effective screening and advancing systems-level understanding of AD pathogenesis.

7. Clinical and Translational Implications

The central ambition of BioFINDER-2—to deliver scalable, robust, and mechanistically anchored biomarker tools for AD—is increasingly attainable due to methodological advances in omics, imaging, machine learning, and real-world data analytics. The identification of blood-based panels with demonstrated utility in early diagnosis (AUC improvements when combined with clinical variables), the stratification of cohort participants based on multi-modal risk models, and the implementation of interactive, updatable knowledge infrastructures all serve to accelerate translation into clinical practice.

Implications include:

Earlier, non-invasive detection and monitoring of AD through plasma biomarker panels and network signatures
Improved trial design via predictive enrichment, reducing costs and increasing treatment effect detectability
Personalized medicine approaches grounded in genotype, metabolic pathway, and bio-network distinctions
Scalable risk assessment tools with direct applicability to diverse and disadvantaged populations

A plausible implication is that as the field moves towards comprehensive systems-level analyses, studies such as BioFINDER-2 will form the empirical bedrock for precision prevention and intervention strategies, ultimately transforming the epidemiology and management of Alzheimer’s disease.