Papers
Topics
Authors
Recent
2000 character limit reached

Towards fairer public transit: Real-time tensor-based multimodal fare evasion and fraud detection (2510.02165v1)

Published 2 Oct 2025 in cs.SE

Abstract: This research introduces a multimodal system designed to detect fraud and fare evasion in public transportation by analyzing closed circuit television (CCTV) and audio data. The proposed solution uses the Vision Transformer for Video (ViViT) model for video feature extraction and the Audio Spectrogram Transformer (AST) for audio analysis. The system implements a Tensor Fusion Network (TFN) architecture that explicitly models unimodal and bimodal interactions through a 2-fold Cartesian product. This advanced fusion technique captures complex cross-modal dynamics between visual behaviors (e.g., tailgating,unauthorized access) and audio cues (e.g., fare transaction sounds). The system was trained and tested on a custom dataset, achieving an accuracy of 89.5%, precision of 87.2%, and recall of 84.0% in detecting fraudulent activities, significantly outperforming early fusion baselines and exceeding the 75% recall rates typically reported in state-of-the-art transportation fraud detection systems. Our ablation studies demonstrate that the tensor fusion approach provides a 7.0% improvement in the F1 score and an 8.8% boost in recall compared to traditional concatenation methods. The solution supports real-time detection, enabling public transport operators to reduce revenue loss, improve passenger safety, and ensure operational compliance.

Summary

We haven't generated a summary for this paper yet.

Whiteboard

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.