Generalization of Vision Transformers versus VGG CNNs for RF spectrogram-based drone detection
Establish whether pre-trained Vision Transformer (ViT) architectures generalise as well as or better than the tested VGG11_BN, VGG13_BN, VGG16_BN, and VGG19_BN convolutional neural networks for drone detection and classification from 2D spectrograms of radio-frequency signals in the 2.4 GHz ISM band, when trained on the provided development dataset and evaluated under real-world field-test conditions.
References
It remains to be seen whether more recent architectures, such as the pre-trained Vision Transformer , generalise as well or better.
— Robust Low-Cost Drone Detection and Classification in Low SNR Environments
(2406.18624 - Glüge et al., 26 Jun 2024) in Section 6 (DISCUSSION)