Generalization of Vision Transformers versus VGG CNNs for RF spectrogram-based drone detection

Establish whether pre-trained Vision Transformer (ViT) architectures generalise as well as or better than the tested VGG11_BN, VGG13_BN, VGG16_BN, and VGG19_BN convolutional neural networks for drone detection and classification from 2D spectrograms of radio-frequency signals in the 2.4 GHz ISM band, when trained on the provided development dataset and evaluated under real-world field-test conditions.

Background

The paper compared several VGG-based CNN architectures (VGG11_BN to VGG19_BN) trained on 2D spectrograms computed from 74.9 ms IQ segments and found no meaningful performance differences among the VGG variants, both on the development dataset and in the field test (balanced accuracy ≈0.80).

While the presented approach demonstrated robust performance, the authors explicitly note uncertainty regarding whether newer architectures, specifically pre-trained Vision Transformers, might generalise as well or better than the VGG baselines for this RF spectrogram classification task.

References

It remains to be seen whether more recent architectures, such as the pre-trained Vision Transformer , generalise as well or better.

— Robust Low-Cost Drone Detection and Classification in Low SNR Environments (2406.18624 - Glüge et al., 26 Jun 2024) in Section 6 (DISCUSSION)

Generalization of Vision Transformers versus VGG CNNs for RF spectrogram-based drone detection

Background

References

Related Problems