ScheduledDropPath: Adaptive NASNet Regularization

Updated 5 February 2026

ScheduledDropPath is a stochastic regularization technique for NASNet, dynamically increasing drop rates during training to stabilize feature learning.
It employs a linear schedule from p_min=0 to p_max, with a scaling factor that preserves activation magnitudes, ensuring appropriate regularization strength.
Empirical results on CIFAR-10 and ImageNet demonstrate that ScheduledDropPath outperforms fixed-rate DropPath, leading to significant performance gains.

ScheduledDropPath is a stochastic regularization technique designed to improve generalization in neural architectures with multi-branch cells, particularly those developed via neural architecture search such as NASNet. It extends the fixed-rate DropPath approach by modulating the probability of dropping computational paths as a function of training progress, thus addressing key deficiencies in traditional stochastic path regularization for deep, over-parameterized, multi-branch structures (Zoph et al., 2017).

1. Motivation for ScheduledDropPath

Standard DropPath regularization independently drops each computational branch in a multi-branch cell with a fixed probability $p$ . Early empirical observations in NASNet training revealed that a constant drop rate was inadequate: small $p$ values led to insufficient regularization, while large values disrupted signal propagation during the critical early phases of feature learning. ScheduledDropPath was developed to provide gentle regularization during initial training when low-level filters are forming and progressively stronger regularization as the network's representational motifs mature. This temporal adaptation is essential for stabilizing learning and enhancing generalization in NASNet-style computational graphs (Zoph et al., 2017).

2. Mathematical Formulation

Let $E$ denote the total number of training epochs and $t \in [0, E]$ indicate the current epoch or suitably normalized training index. Two hyperparameters govern the schedule:

$p_{\min}$ : Initial drop probability at the start of training ( $t=0$ ).
$p_{\max}$ : Maximum drop probability at the end of training ( $t=E$ ).

The drop probability $p(t)$ is scheduled via a linear ramp: $p(t) = p_{\min} + (p_{\max} - p_{\min})\,\frac{t}{E}$ For each path at training epoch $p$ 0, the branch output $p$ 1 is replaced by

$p$ 2

The scaling factor $p$ 3 preserves the expected activation magnitude, following the principle of inverted dropout.

3. Scheduling Strategy and Hyperparameters

ScheduledDropPath employs a zero-initialized ramp, setting $p$ 4 and linearly increasing to $p$ 5 at epoch $p$ 6. Published NASNet experiments report the following settings:

CIFAR-10: $p$ 7, $p$ 8, $p$ 9
ImageNet: $E$ 0, $E$ 1, $E$ 2

A sweep over $E$ 3 showed $E$ 4 yielded the optimal trade-off between regularization strength and gradient flow for NASNet architectures.

4. Implementation in NASNet Cells

ScheduledDropPath is applied to every distinct branch in a NASNet cell during the forward pass, using the same schedule $E$ 5. The procedure operates as follows:

For each input branch, draw an independent Bernoulli mask with success probability $E$ 6.
If the branch is dropped, output zero; otherwise, scale activations by $E$ 7.
No dropping occurs at test time.

NASCellBlock Pseudocode

$t \in [0, E]$ 3 In practice, this process is incorporated into each multi-branch block within the overall cell structure (Zoph et al., 2017).

5. Comparison to Fixed-Rate DropPath

DropPath with a fixed probability $E$ 8 regularizes all training phases equally. If $E$ 9 is set low, NASNet’s over-capacity is insufficiently constrained; if $t \in [0, E]$ 0 is high, sensitivity to dropped paths during early training can severely impair feature acquisition. ScheduledDropPath’s zero-to-high ramp allows networks to learn robust low-level filters initially, applying strong regularization only as higher-level representations solidify. Empirically, fixed-rate DropPath yields only marginal improvements or, if poorly tuned, degrades performance. ScheduledDropPath consistently delivers marked generalization gains in experiments across CIFAR-10 and ImageNet (Zoph et al., 2017).

6. Empirical Performance and Ablations

Empirical validation in (Zoph et al., 2017) demonstrates the impact of ScheduledDropPath:

Experiment	Test Error / Top-1 Acc.	Regularization
CIFAR-10 NASNet-A (7@2304) baseline	~3.4% error	none
+ Fixed-rate DropPath ( $t \in [0, E]$ 1)	~3.25% error	moderate
+ ScheduledDropPath ( $t \in [0, E]$ 2)	2.97% error	strong, ramped
+ ScheduledDropPath + Cutout	2.40% error	state-of-the-art
ImageNet NASNet-A (7@1920) baseline	~79.5% top-1	none
+ Fixed-rate DropPath	~80.0% top-1	moderate
+ ScheduledDropPath	80.8% top-1	strong, ramped
ImageNet NASNet-A (6@4032) + ScheduledDP	82.7% top-1 (best published)	strong, ramped

ScheduledDropPath led to state-of-the-art CIFAR-10 and ImageNet performances, with significant reductions in computational complexity (FLOPs) relative to previous best models. The compound effect of ScheduledDropPath with other regularizers (e.g., cutout) enabled test error reductions approaching one full percentage point. A plausible implication is that ramped stochastic regularization synergizes particularly effectively with neural architecture search–based models comprising many parallel computational paths.

7. Broader Implications

ScheduledDropPath’s development was motivated by architectural search–designed convolutional models (NASNet) with deep, multi-branch cells. By providing epoch-dependent stochastic path dropping, it addresses limitations of static regularization in dynamic learning environments. Its efficacy in large-scale image recognition benchmarks established a methodological precedent for time-varying regularization in deep networks. The approach remains significant for future neural architecture search efforts, especially where the learned topologies induce overparameterization and complex inter-branch dependencies (Zoph et al., 2017).

Markdown Report Issue Upgrade to Chat

References (1)

Learning Transferable Architectures for Scalable Image Recognition (2017)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to ScheduledDropPath.

ScheduledDropPath: Adaptive NASNet Regularization

1. Motivation for ScheduledDropPath

2. Mathematical Formulation

3. Scheduling Strategy and Hyperparameters

4. Implementation in NASNet Cells

NASCellBlock Pseudocode

5. Comparison to Fixed-Rate DropPath

6. Empirical Performance and Ablations

7. Broader Implications

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Don't miss out on important new AI/ML research

ScheduledDropPath: Adaptive NASNet Regularization

1. Motivation for ScheduledDropPath

2. Mathematical Formulation

3. Scheduling Strategy and Hyperparameters

4. Implementation in NASNet Cells

NASCellBlock Pseudocode

5. Comparison to Fixed-Rate DropPath

6. Empirical Performance and Ablations

7. Broader Implications

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Related Topics

Don't miss out on important new AI/ML research

Sign up for free to explore the frontiers of research