Papers
Topics
Authors
Recent
Search
2000 character limit reached

CompACT Pedestrian Detection

Updated 8 January 2026
  • CompACT is a complexity-aware pedestrian detection framework that balances accuracy and computational cost by integrating features with varying computational budgets.
  • It employs a Lagrangian formulation and boosting algorithm to prioritize inexpensive features in early cascade stages and reserve complex features for later stages.
  • Empirical evaluations on Caltech and KITTI benchmarks reveal that CompACT achieves superior detection accuracy with efficient, real-time processing.

Complexity-Aware Pedestrian Detection (CompACT) is a framework for constructing cascaded pedestrian detectors that explicitly optimize a trade-off between detection accuracy and computational complexity. CompACT enables the seamless integration of features possessing widely varying computational costs, including both classical image features and deep convolutional neural network (CNN) activations, within a single learned detector. By penalizing unnecessary computation, CompACT allocates inexpensive features to the early cascade stages and reserves complex, high-capacity features for later stages, when only a small subset of ambiguous input windows remains. This approach yields a detector that is both accurate and computationally efficient across challenging pedestrian detection benchmarks (Cai et al., 2015).

1. Lagrangian Formulation: Joint Optimization of Accuracy and Complexity

CompACT is motivated by the need to balance detection accuracy against computational constraints. Classical AdaBoost minimizes the empirical classification risk: RE[F]1Si=1SeyiF(xi),R_E[F] \simeq \frac{1}{|S|}\sum_{i=1}^{|S|} e^{-y_i F(x_i)}, where F(x)F(x) is the score function, yi{±1}y_i \in \{\pm 1\} are labels, and SS is the training set.

CompACT introduces a second risk term, “complexity risk” RC[F]R_C[F], to quantify average computational cost. Detector learning is thus cast as: minFRE[F]s.t.  RC[F]γ\min_{F} R_E[F] \quad \text{s.t.} \; R_C[F] \leq \gamma which is equivalent, via a Lagrange multiplier η0\eta \ge 0, to minimizing the composite objective: $\mathcal{L}[F] = R_E[F] + \eta R_C[F]. \tag{1}$

RE[F]R_E[F] measures classification error, and RC[F]R_C[F] penalizes excessive computation, making the trade-off explicit. The complexity risk is defined analogously to RE[F]R_E[F]: RC[F]1Si=1Sτ(κ(yi,F(xi))),R_C[F] \simeq \frac{1}{|S|}\sum_{i=1}^{|S|} \tau(\kappa(y_i, F(x_i))),

with κ(y,F(x))=yΩ(F(x))\kappa(y, F(x)) = y\,\Omega(F(x)) as the signed “complexity margin” and τ(v)=max(0,v)\tau(v) = \max(0, -v) as a hinge-style loss. The per-sample implementation complexity Ω(F(x))\Omega(F(x)) is the average cost per window: Ω(F(x))=1mk=1mrk(x)Ω(fk),rk(x)=j<k1 ⁣ ⁣1(Fj(x)+Tj>0),(2)\Omega(F(x)) = \frac{1}{m}\sum_{k=1}^m r_k(x)\, \Omega(f_k), \quad r_k(x) = \prod_{j<k} 1\!\!1(F_j(x)+T_j>0), \tag{2} where fkf_k are weak learners, Fk=j=1kfjF_k = \sum_{j=1}^k f_j, TjT_j are thresholds, and mm is the number of cascade stages.

2. CompACT Boosting Algorithm and Learning Dynamics

Learning proceeds via functional gradient descent on L[F]\mathcal{L}[F]. The negative gradient for a weak learner gg combines classification and computational cost: D[g]=1Si=1Syi[ωig(xi)+ηriψim+1Ω(g(xi))],(3)D[g] = \frac{1}{|S|} \sum_{i=1}^{|S|} y_i \left[ \omega_i g(x_i) + \frac{\eta r_i \psi_i}{m+1}\Omega(g(x_i)) \right], \tag{3} where ωi=eyiF(xi)\omega_i = e^{-y_i F(x_i)} and ψi=τ(yiΩ(F(xi)))\psi_i = -\tau'(y_i\,\Omega(F(x_i))).

At each boosting round, the weak learner maximizing D[g]D[g] is selected:

  • Compute Cg=iyiωig(xi)C_g = \sum_i y_i \omega_i g(x_i)
  • Compute Kg=iyiriψiΩ(g(xi))K_g = \sum_i y_i r_i \psi_i \Omega(g(x_i))
  • Score: D[g]=(Cg+ηKg/(m+1))/SD[g] = (C_g + \eta K_g/(m+1))/|S|
  • Select g=argmaxgD[g]g^* = \arg\max_g D[g]
  • Find optimal step size α\alpha^* via 1-D line search minimizing L[F+αg]\mathcal{L}[F+\alpha g^*]
  • Update: FF+αgF \leftarrow F + \alpha^* g^*

The algorithm penalizes complex features during early stages (when more windows remain), and relaxes this constraint in later stages as mm increases. This scheduling is a principled consequence of the joint loss in (1) (Cai et al., 2015).

3. Modeling Feature Complexity

CompACT accommodates features with heterogeneous computational costs, assigning each a cost Ω\Omega entering directly into the risk function:

  • Pre-computed features: Aggregate Channel Features (ACF), Ω=1\Omega=1.
  • Just-in-time (JIT) features: Evaluated only on surviving windows at a given stage.
    • Self-Similarity (SS): Ω=2\Omega=2
    • Checkerboard (CB): 2×22\times2 on 10 ACF channels, Ω=4\Omega=4
    • LDA basis filters: 3×33\times3 PCA-like on ACF, Ω=9\Omega=9
    • Small CNN conv5 channels: ΩCNN\Omega_{\text{CNN}} for the initial trigger (entire CNN forward-pass), then 1 per JIT feature.
    • CNN-Checkerboard (CNNCB): on conv5 feature maps, Ω=4\Omega=4 per feature

For CNNs, a “trigger” cost ΩCNN\Omega_{\text{CNN}} is charged on first use per window, and zero for subsequent conv5 feature extractions within that window. This cost model enables complexity-aware scheduling of diverse feature types.

4. Embedded Cascade Structure and Feature Scheduling

CompACT learns an embedded cascade of MM stages, each a depth-2 decision tree fkf_k with threshold TkT_k. The cumulative score at stage kk is Fk(x)=j=1kfj(x)F_k(x) = \sum_{j=1}^k f_j(x). A window is rejected if Fk(x)+Tk0F_k(x) + T_k \leq 0: DECIDE(x)={rejectk:  Fk(x)+Tk0, acceptFM(x)+TM>0.\mathrm{DECIDE}(x) = \begin{cases} \text{reject} & \exists\,k:\; F_k(x)+T_k \le 0, \ \text{accept} & F_M(x) + T_M > 0. \end{cases}

Empirically, early cascade stages rely on cheap ACF features, intermediate stages incorporate SS and CB, while expensive CNNCB and CNN features are reserved for final stages. As most windows are eliminated early, only a small fraction incurs the cost of high-complexity features.

5. Integration of Deep Convolutional Neural Networks

Deep CNN conv5-layer activations are integrated as another JIT feature family with a per-feature cost structure as previously described. CompACT typically refrains from selecting raw conv5 CNN features early in the cascade, instead favoring CNNCB filters when advantageous.

To incorporate a large, ImageNet-pretrained CNN (e.g., AlexNet or VGG), the network is embedded as the final weak learner gg^* at stage MM. Boosting computes the optimal step size α\alpha^* for this feature. During inference, the cascade processes early stages using fast features at approximately 4 fps (CPU), while only the ~10% of windows surviving after NMS propagate to the large CNN (adding about 2 fps processing on GPU). This architecture generalizes the combination of a proposal stage with a CNN, but operates as a unified single-pass system.

6. Empirical Results and Comparative Analysis

CompACT was extensively evaluated on Caltech and KITTI pedestrian datasets using the log-average miss-rate (MR) over [102,100][10^{-2}, 10^0] FPPI and AUC for KITTI. Runtime measurements employed a 2.1 GHz Intel Xeon (CPU) and NVIDIA K40M (GPU).

Table 1: Single-feature Cascades vs. CompACT on Caltech

Method log-avg MR [%] time (s)
ACF-only 42.6 0.07
SS-only 34.3 0.08
CB-only 37.9 0.23
LDA-only 37.2 0.16
CNN-only 28.1 0.87
CNNCB-only 26.9 2.05
CompACT-ACF 32.2 0.11
CompACT-CNN 23.8 0.28

Table 2: Large CNN as Final Stage (Caltech)

Method AlexNet MR Δtime VGG MR Δtime
CompACT-small-CNN only 23.8 23.8
+ embedded large CNN-Alex 15.0 +0.10 14.8 +0.10
+ embedded large CNN-VGG 11.8 +0.25 11.8 +0.25

Table 3: State-of-the-Art Comparison on Caltech "Reasonable"

Detector MR [%] time (s)
SpatialPooling+ (ECCV’14) 25.4 5.0†
Checkerboards (CVPR’15) 25.6 4.0†
R‐CNN (arXiv’15) 32.9 2.0†
Katamari (ECCV) 24.8 0.5†
CompACT‐Deep (embed VGG) 11.8 2.0

Table 4: KITTI (Moderate) AUC and Timing

Method AUC [%] time (s)
FilteredICF 54.0 0.40
pAUCEnsT 54.5 0.60
R‐CNN (KITTI) 50.1 4.0
Regionlets 61.2 1.0†
CompACT‐Deep 58.7 1.0

†Excludes proposal-generation time or is reported end-to-end.

Key observations:

  • CompACT's Lagrangian framework governs an explicit, tunable trade-off between accuracy and complexity via η\eta.
  • Boosting ranks weak learners by a composite classification-plus-computation score, naturally allocating feature types across stages.
  • Embedding AlexNet or VGG as the final stage yields single-pass detectors that significantly outperform separate proposal+CNN pipelines in both accuracy and speed.

7. Implications and Significance

CompACT operationalizes the notion of budgeted detection, allowing large, heterogeneous feature pools and optimizing for both accuracy and computational efficiency. The explicit complexity-awareness advances the state of the art on pedestrian detection benchmarks and enables practical, real-time detectors. A plausible implication is that similar frameworks could be leveraged for other detection tasks requiring heterogeneous feature integration and complexity management (Cai et al., 2015).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Complexity-Aware Pedestrian Detection (CompACT).