Time-Domain Random-Masking Augmentation
- Time-domain random-masking augmentation is a stochastic technique that applies a binary Bernoulli mask to simulate missing data in time-series imputation tasks.
- It employs independent masking with specified probabilities (e.g., 0.1) to generate artificial missingness for both augmentation (RMEO) and overlay (RMOD) paradigms.
- Empirical results show that in-mini-batch overlay masking yields lower imputation errors, guiding best practices for robust model evaluation.
Time-domain random-masking augmentation is a technique used in the evaluation and training of time series imputation models, whereby artificial missingness is introduced into observed data via stochastic masking. This strategy has become prevalent in healthcare time series applications, especially for benchmarking model accuracy and robustness, yet it faces scrutiny regarding its ability to represent practical clinical missingness patterns (Qian et al., 2024). At its core, the approach relies on independently masking observed features with a specified probability, serving as a controlled methodology for model assessment and augmentation.
1. Formalism and Definition of Random-Masking Operators
Given a completely observed time-feature matrix , the time-domain random-masking procedure introduces a binary mask matrix , resulting in the masked input . Under the missing completely at random (MCAR) regime, each entry is independently sampled as:
where denotes the masking ratio (proportion of entries to be masked). The operation denotes element-wise multiplication, effectively zeroing masked entries. No explicit support for block-masking (contiguous runs) or temporal masks is provided, though such schemes are referenced in related work as possible extensions.
2. Sampling Procedures: Augmentation and Overlay
Two distinct random-masking paradigms are implemented within PyPOTS-based workflows:
- Augmentation (RMEO): For each mini-batch during model training, a fresh mask is sampled and applied only to originally observed entries. Formally, existing missing entries are exempt from masking; artificial missingness is overlaid on the observed subset.
- Overlay (RMOD): Mask is sampled as above but is applied across all entries, including those that were originally missing. The final observed mask becomes .
No stratification or incorporation of clinical missing-not-at-random (MNAR) rules is used; all missingness is stochastic and pointwise (MCAR).
3. Hyperparameter Selection and Implementation Strategies
Key hyperparameters are as follows:
- Mask ratio : Evaluated values include $0.05$, $0.10$, and $0.20$, corresponding to 5%, 10%, and 20% missing rates.
- Masking schedule:
- Pre-mask: A single mask is sampled for the entire training set and kept constant.
- In-mini-batch: is resampled for each mini-batch, serving as data augmentation.
No explicit window size or number of contiguous blocks is parameterized, since only independent Bernoulli masking is considered. Optimal performance, in terms of mean absolute error (MAE) and mean squared error (MSE), is generally observed at .
4. Comparative Quantitative Results
The paper systematically evaluates random-masking augmentation variants but does not contrast against clinically-informed or structured masking. Results demonstrate that in-mini-batch augmentation (RMEO), especially with overlay masking (RMOD), achieves lower imputation error:
| Masking Method | Dataset | TimesNet MAE | Relative Performance |
|---|---|---|---|
| Augmentation (RMEO) | ETTm1 | $0.119$ | Slightly higher than RMOD |
| Overlay (RMOD) | ETTm1 | $0.110$ | Marginally best |
| Augmentation (RMEO) | PhysioNet | $0.211$ | Reference level |
| Augmentation (RMEO) | Air | $0.163$ | Comparable to overlay |
Pre-mask experiments are more stable but yield MAE rates 1–5% above in-mini-batch variants. Only pure random masking is compared; no AUROC or RMSE is reported, nor are clinical MNAR masks included.
5. Best Practice Recommendations
Empirical findings support specific recommendations:
- Prefer in-mini-batch random masking over single pre-masks to maximize exposure to variation in artificial missingness and enhance regularization effects.
- Set the masking ratio near $0.1$ to balance difficulty and augmentation benefit.
- Overlay masking (RMOD) yields marginally lower imputation error than simple augmentation in most cases.
- Apply normalization after masking for realistic deployment in incomplete-data scenarios.
- Report all experimental details—mask ratio , masking method (augment vs. overlay), timing (pre vs. in-batch), normalization order—and use a consistent sweep for benchmarking.
6. Extensions and Limitations
The discussed framework applies exclusively stochastic, pointwise random-masking. While no block-masking or MNAR schemes are implemented, extension would require replacement of Bernoulli sampling with procedures such as:
For :
- Choose start ,
- Set for all ,
- Tune to satisfy .
There is no evaluation of structured missing patterns or their downstream predictive implications in the context of mortality AUROC or other clinical metrics. This suggests that current random-masking augmentation may overestimate practical performance in clinical settings and motivates the adoption of clinically-informed masking for future research (Qian et al., 2024).