Papers
Topics
Authors
Recent
Search
2000 character limit reached

ScheduledDropPath: Adaptive NASNet Regularization

Updated 5 February 2026
  • ScheduledDropPath is a stochastic regularization technique for NASNet, dynamically increasing drop rates during training to stabilize feature learning.
  • It employs a linear schedule from p_min=0 to p_max, with a scaling factor that preserves activation magnitudes, ensuring appropriate regularization strength.
  • Empirical results on CIFAR-10 and ImageNet demonstrate that ScheduledDropPath outperforms fixed-rate DropPath, leading to significant performance gains.

ScheduledDropPath is a stochastic regularization technique designed to improve generalization in neural architectures with multi-branch cells, particularly those developed via neural architecture search such as NASNet. It extends the fixed-rate DropPath approach by modulating the probability of dropping computational paths as a function of training progress, thus addressing key deficiencies in traditional stochastic path regularization for deep, over-parameterized, multi-branch structures (Zoph et al., 2017).

1. Motivation for ScheduledDropPath

Standard DropPath regularization independently drops each computational branch in a multi-branch cell with a fixed probability pp. Early empirical observations in NASNet training revealed that a constant drop rate was inadequate: small pp values led to insufficient regularization, while large values disrupted signal propagation during the critical early phases of feature learning. ScheduledDropPath was developed to provide gentle regularization during initial training when low-level filters are forming and progressively stronger regularization as the network's representational motifs mature. This temporal adaptation is essential for stabilizing learning and enhancing generalization in NASNet-style computational graphs (Zoph et al., 2017).

2. Mathematical Formulation

Let EE denote the total number of training epochs and t[0,E]t \in [0, E] indicate the current epoch or suitably normalized training index. Two hyperparameters govern the schedule:

  • pminp_{\min}: Initial drop probability at the start of training (t=0t=0).
  • pmaxp_{\max}: Maximum drop probability at the end of training (t=Et=E).

The drop probability p(t)p(t) is scheduled via a linear ramp: p(t)=pmin+(pmaxpmin)tEp(t) = p_{\min} + (p_{\max} - p_{\min})\,\frac{t}{E} For each path at training epoch tt, the branch output xRH×W×Cx \in \mathbb{R}^{H \times W \times C} is replaced by

x~={0,with probability p(t) x1p(t),with probability 1p(t)\widetilde{x} = \begin{cases} 0, & \text{with probability } p(t) \ \frac{x}{1-p(t)}, & \text{with probability } 1-p(t) \end{cases}

The scaling factor 11p(t)\frac{1}{1-p(t)} preserves the expected activation magnitude, following the principle of inverted dropout.

3. Scheduling Strategy and Hyperparameters

ScheduledDropPath employs a zero-initialized ramp, setting pmin=0p_{\min}=0 and linearly increasing to pmaxp_{\max} at epoch EE. Published NASNet experiments report the following settings:

  • CIFAR-10: E=600E=600, pmin=0.0p_{\min}=0.0, pmax=0.5p_{\max}=0.5
  • ImageNet: E=350E=350, pmin=0.0p_{\min}=0.0, pmax=0.5p_{\max}=0.5

A sweep over pmax{0.3,0.4,0.5}p_{\max} \in \{0.3,0.4,0.5\} showed pmax=0.5p_{\max}=0.5 yielded the optimal trade-off between regularization strength and gradient flow for NASNet architectures.

4. Implementation in NASNet Cells

ScheduledDropPath is applied to every distinct branch in a NASNet cell during the forward pass, using the same schedule p(t)p(t). The procedure operates as follows:

  • For each input branch, draw an independent Bernoulli mask with success probability $1-p(t)$.
  • If the branch is dropped, output zero; otherwise, scale activations by $1/(1-p(t))$.
  • No dropping occurs at test time.

NASCellBlock Pseudocode

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
function NASCellBlock(h_a, h_b, t):
  x_a  op_a(h_a)
  x_b  op_b(h_b)
  p  p_max * (t / E)    # linear ramp schedule
  keep_mask_a  Bernoulli(1p)
  keep_mask_b  Bernoulli(1p)
  if keep_mask_a == 0:
    x_a  0
  else:
    x_a  x_a / (1p)
  if keep_mask_b == 0:
    x_b  0
  else:
    x_b  x_b / (1p)
  h_out  combine_fn(x_a, x_b)
  return h_out
In practice, this process is incorporated into each multi-branch block within the overall cell structure (Zoph et al., 2017).

5. Comparison to Fixed-Rate DropPath

DropPath with a fixed probability pp regularizes all training phases equally. If pp is set low, NASNet’s over-capacity is insufficiently constrained; if pp is high, sensitivity to dropped paths during early training can severely impair feature acquisition. ScheduledDropPath’s zero-to-high ramp allows networks to learn robust low-level filters initially, applying strong regularization only as higher-level representations solidify. Empirically, fixed-rate DropPath yields only marginal improvements or, if poorly tuned, degrades performance. ScheduledDropPath consistently delivers marked generalization gains in experiments across CIFAR-10 and ImageNet (Zoph et al., 2017).

6. Empirical Performance and Ablations

Empirical validation in (Zoph et al., 2017) demonstrates the impact of ScheduledDropPath:

Experiment Test Error / Top-1 Acc. Regularization
CIFAR-10 NASNet-A (7@2304) baseline ~3.4% error none
+ Fixed-rate DropPath (p=0.25p=0.25) ~3.25% error moderate
+ ScheduledDropPath (p:00.5p:0 \to 0.5) 2.97% error strong, ramped
+ ScheduledDropPath + Cutout 2.40% error state-of-the-art
ImageNet NASNet-A (7@1920) baseline ~79.5% top-1 none
+ Fixed-rate DropPath ~80.0% top-1 moderate
+ ScheduledDropPath 80.8% top-1 strong, ramped
ImageNet NASNet-A (6@4032) + ScheduledDP 82.7% top-1 (best published) strong, ramped

ScheduledDropPath led to state-of-the-art CIFAR-10 and ImageNet performances, with significant reductions in computational complexity (FLOPs) relative to previous best models. The compound effect of ScheduledDropPath with other regularizers (e.g., cutout) enabled test error reductions approaching one full percentage point. A plausible implication is that ramped stochastic regularization synergizes particularly effectively with neural architecture search–based models comprising many parallel computational paths.

7. Broader Implications

ScheduledDropPath’s development was motivated by architectural search–designed convolutional models (NASNet) with deep, multi-branch cells. By providing epoch-dependent stochastic path dropping, it addresses limitations of static regularization in dynamic learning environments. Its efficacy in large-scale image recognition benchmarks established a methodological precedent for time-varying regularization in deep networks. The approach remains significant for future neural architecture search efforts, especially where the learned topologies induce overparameterization and complex inter-branch dependencies (Zoph et al., 2017).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to ScheduledDropPath.