mmcmcBayes: Region-Level DMR Detection in EWAS
- mmcmcBayes is an R package that uses a region-centric, multistage Bayesian MCMC framework to detect differentially methylated regions (DMRs) in epigenomic studies.
- It models regional methylation patterns with a flexible alpha-skew generalized normal distribution and adaptively splits genomic regions based on Bayes factors.
- The package offers comprehensive tools for summarizing, comparing, and visualizing DMRs, serving as an effective alternative to CpG-level aggregation methods.
mmcmcBayes is an R package designed for region-level detection of differentially methylated regions (DMRs) in epigenome-wide association studies (EWAS). It implements a multistage Markov chain Monte Carlo (MCMC) framework that directly models regional methylation patterns, employing a flexible skewed distribution for summary statistics and adaptively splitting genomic regions based on Bayesian evidence measures. The package provides functions for summarizing, comparing, and visualizing detected regions, and is positioned as a region-level alternative to conventional CpG-aggregation strategies (Yang et al., 4 Feb 2026).
1. Objective and Rationale
The principal objective of mmcmcBayes is to detect DMRs—contiguous blocks of CpG sites with consistent, group-specific differences in DNA methylation levels. In contrast to methods such as bumphunter, DMRcate, DSS, and bsseq, which typically conduct CpG-level tests followed by spatial aggregation, mmcmcBayes treats regions as fundamental units. This region-centric approach models sample-wise summaries over candidate regions, testing region-level hypotheses directly.
Key motivations include the ability to:
- Accommodate spatial correlation of CpGs within genomic regions without imposing ad hoc thresholds for grouping.
- Capture skewed or multimodal regional methylation patterns, often encountered in heterogeneous biological samples.
- Evaluate evidence for differential methylation via Bayes factors, eliminating dependence on p-value permutation calibration procedures.
A DMR identified by mmcmcBayes is a region where the joint methylation distribution differs between two groups, commonly corresponding to regulatory elements or disease biomarkers (Yang et al., 4 Feb 2026).
2. Statistical Modeling Framework
Data Summarization
For each subject , group (with for cancer and for control), and candidate region at stage , the input is the mean -value across CpGs in the segment, mapped to an M-value for numerical stability:
$y_{jmk}^\ell = \logit(\beta_{jmk}^\ell + c) = \log\left(\frac{\beta_{jmk}^\ell + c}{1 - \beta_{jmk}^\ell + c}\right),\quad c = 10^{-6}$
Distributional Model
The alpha-skew generalized normal (ASGN) distribution, with four parameters (location , scale , skewness , shape ), is employed to model :
Parameter roles:
| Parameter | Effect | Description |
|---|---|---|
| Central location | Distribution shift | |
| Scale | Distribution width | |
| Skewness ( right, left) | Asymmetry extent | |
| Shape | Tail weight |
Each segment at stage and group supposes i.i.d. ASGN, with priors , , $\sigma^2_{mk}^\ell \sim \text{IG}(A_d, B_d)$. Hyperparameters are user-controllable or weakly-informative; posterior means from stage are used for (Yang et al., 4 Feb 2026).
3. Bayesian Evidence Evaluation
Hypotheses and Bayes Factor
For region at stage :
- : All samples share one ASGN.
- : Cancer and control groups have distinct ASGN distributions.
The Bayes factor is:
In practice, marginal likelihoods are approximated by plugging in posterior mean MCMC estimates:
Interpretation: BF signals evidence for differential methylation. Thresholds for declaring/splitting regions are user-defined (default: (0.5, 0.8, 1.05)), with lower values producing finer segmentation and higher values yielding greater conservatism (Yang et al., 4 Feb 2026).
4. Multistage Region-Splitting Strategy
The detection algorithm proceeds in stages:
- Initial stage: the whole chromosome or block is treated as one segment.
- For each segment at stage :
- Compute mean M-values for each group.
- Fit ASGN under and (with asgn_func); obtain posterior means.
- Compute .
- If threshold for stage :
- If , split into num_splits subregions for next stage.
- If , declare as final DMR.
- If threshold, no action.
- Iteration stops when reached or no new segments qualify.
- Output is a table of DMRs (chromosome, CpG range, CpG count, BF, stage).
The region-splitting mechanism adaptively refines DMR candidates, avoiding arbitrary aggregation of CpG-level signals and focusing resolution according to statistical evidence (Yang et al., 4 Feb 2026).
5. Package Implementation and Workflow
Installation and Dependencies
mmcmcBayes is available from CRAN:
1 |
install.packages("mmcmcBayes") |
Core Functions
| Function | Purpose |
|---|---|
| mmcmcBayes | Main region-level DMR detection (returns DMR data.frame) |
| asgn_func | Fits ASGN via MCMC; returns posteriors for |
| summarize_dmrs | Summarizes detected DMRs (counts, region sizes, Bayes factors) |
| compare_dmrs | Computes overlaps between two DMR results |
| plot_dmr_region | Plots group mean M-values across CpGs in region |
Example Workflow
- Prepare two methylation data.frames (CpGs by sample, sorted by genomic position).
- Run detection:
1 2 3 |
rst <- mmcmcBayes(cancer_data, normal_data) summary <- summarize_dmrs(rst) plot_dmr_region(rst, cancer_data, normal_data, dmr_index=1:4) |
Parameters such as max_stages, num_splits, MCMC control, and thresholds are adjustable (Yang et al., 4 Feb 2026).
6. Empirical Performance and Application
Simulation Study
- Simulations (chr6 baseline + Gaussian noise, 10 synthetic DMRs per replicate, various lengths/effect sizes) show that, at max_stages=2, increased splits raise FDR with negligible sensitivity change.
- At max_stages=3, sensitivity increases up to ~50 splits, with FDR controlled below ~10%, after which both FDR and computational time rise sharply.
- Default parameters (max_stages=3, num_splits=50, bf_thresholds=(0.5, 0.8, 1.05)) achieve sensitivity ~80–90% and FDR ~5–10% (Yang et al., 4 Feb 2026).
Real Data
- Application to Illumina 450K lung cancer data (chr6; 36,438 CpGs; 19/group): detected 1,514 DMRs at stage 3, regions median 15 CpGs, Bayes factor range [1.05, ~1.96], with visualized heterogeneity across DMRs (Yang et al., 4 Feb 2026).
7. Recommendations and Usage Considerations
mmcmcBayes provides a region-level Bayesian framework that models distributional shifts via the ASGN and guides segmentation with Bayes factors. Recommended default settings (Illumina-type data):
- max_stages = 3, num_splits = 50
- bf_thresholds = c(0.5, 0.8, 1.05)
- MCMC: nburn=5000, niter=10000, thin=1
Best practices:
- Pre-sort CpGs by chromosome and position.
- Apply the analysis in parallel by chromosome if necessary.
- Use summarize_dmrs and plot_dmr_region to inspect Bayes factor distributions and DMR configuration.
- mmcmcBayes provides a complement to CpG-level tools, especially in the presence of skewed or multimodal methylation distributions where region-level differences are not well-captured by sitewise aggregation (Yang et al., 4 Feb 2026).