CAN-D: A Modular Four-Step Pipeline for Comprehensively Decoding Controller Area Network Data

Published 9 Jun 2020 in cs.OH and eess.SP | (2006.05993v2)

Abstract: CANs are a broadcast protocol for real-time communication of critical vehicle subsystems. Original equipment manufacturers of passenger vehicles hold secret their mappings of CAN data to vehicle signals, and these definitions vary according to make, model, and year. Without these mappings, the wealth of real-time vehicle information hidden in the CAN packets is uninterpretable, impeding vehicle-related research. Guided by the 4-part CAN signal definition, we present CAN-D (CAN-Decoder), a modular, 4-step pipeline for identifying each signal's boundaries (start bit, length), endianness (byte order), signedness (bit-to-integer encoding), and by leveraging diagnostic standards, augmenting a subset of the extracted signals with physical interpretation. We provide a comprehensive review of the CAN signal reverse engineering research. Previous methods ignore endianness and signedness, rendering them incapable of decoding many standard CAN signal definitions. Incorporating endianness grows the search space from 128 to 4.72E21 signal tokenizations and introduces a web of changing dependencies. We formulate, formally analyze, and provide an efficient solution to an optimization problem, allowing identification of the optimal set of signal boundaries and byte orderings. We provide two novel, state-of-the-art signal boundary classifiers-both superior to previous approaches in precision and recall in three different test scenarios-and the first signedness classification algorithm which exhibits a $>$97\% F-score. CAN-D is the only solution with the potential to extract any CAN signal. In evaluation on 10 vehicles, CAN-D's average $\ell^1$ error is 5x better than all previous methods and exhibits lower ave. error, even when considering only signals that meet prior methods' assumptions. CAN-D is implemented in lightweight hardware, allowing for an OBD-II plugin for real-time in-vehicle CAN decoding.

Abstract PDF Upgrade to Chat

Citations (29)

View on Semantic Scholar

Summary

The paper introduces CAN-D, a modular four-step pipeline for comprehensively decoding vehicle Controller Area Network (CAN) data, addressing signal boundaries, endianness, signedness, and physical interpretation.
Numerical results show CAN-D significantly improves decoding accuracy, reducing errors by over 80% compared to previous methods when tested on diverse vehicle data.
The comprehensive decoding approach of CAN-D enables potential applications in cybersecurity, driver behavior analysis, and real-time vehicle monitoring.
meta_description

Analysis of the CAN-D Paper

The research paper presents a comprehensive solution for decoding Controller Area Network (CAN) data in vehicles through a modular four-step pipeline named CAN-Decoder (CAN-D). This work is the first to address a complete and vehicle-agnostic approach to CAN signal reverse engineering that includes not only the identification of signal boundaries but also considers endianness, signedness, and the physical interpretation of signals. These factors are critical for decoding all CAN signals as they are defined in industry-standard DBC files. Previous methods have typically only focused on part of the problem, such as signal boundaries, or have incorrectly assumed a single endianness or encoding style, thus limiting their applicability.

Summary of the Methodology

CAN-D is structured as a four-step pipeline:

Signal Boundary Classification: This involves determining where signals start and end within a CAN message. The paper details both a novel unsupervised heuristic and a supervised machine learning method to predict signal boundaries with high accuracy. These methods surpass previously existing ones in both precision and recall.
Endianness Optimization: Existing methods have largely assumed big-endian byte ordering, but CAN specifies that signals can use both big- and little-endian formats. CAN-D formulates and solves an optimization problem that simultaneously decodes signal boundaries and endianness. The solution takes a candidate signal's join and cut penalties into account, allowing for precise signal extraction regardless of byte order.
Signedness Classification: The paper presents the first heuristic designed to classify signals as signed or unsigned. This is crucial since the two's complement encoding used for signed signals affects how their data is interpreted numerically.
Physical Interpretation via Diagnostic Matching: Incorporating existing techniques from earlier work (specifically ACTT by Verma et al.), CAN-D matches tokenized signals to diagnostic data collected from vehicles. This aids in interpreting signals with known physical meanings, adding labels, units, scale, and offset values.

Numerical Results and Comparison

The research includes quantitative evaluations using a diverse dataset from 10 different vehicle makes. CAN-D achieves a significant reduction in error compared to previous methods, improving the accuracy of extracted signals by over 80%. The pipeline is also benchmarked in a practical setting, ensuring its real-time applicability and integration into lightweight hardware solutions. The analysis of signal boundaries conducted under different test scenarios reveals that CAN-D consistently outperforms rivals in classifying non-obvious boundaries, which are critical for correct decoding.

Key Innovations and Implications

CAN-D introduces multiple innovations to the field:

Comprehensive End-to-End Decoding: By considering all aspects of CAN signal encoding (boundaries, endian, signedness), CAN-D provides a framework where researchers can independently improve each component for future enhancements.
Potential to Impact Multiple Domains: Unlocking the full potential of CAN data has far-reaching implications. The ability to accurately interpret CAN signals can empower cybersecurity research, facilitate driver behavior analysis, improve aftermarket automotive performance tuning, and advance vehicle-to-vehicle communication technologies.
Lightweight, Real-Time Solution: The authors designed a hardware prototype that can be deployed to decode CAN data in situ. This enables practical applications such as real-time vehicle monitoring and analytics.

Future Directions

While the work presented in CAN-D is extensive, there are several potential directions for future research:

Integration with Machine Learning Models: More sophisticated models for each component of the pipeline may allow for better handling of edge cases such as highly noise-prone signals or very short signal sequences.
Expansion to Different Vehicle Types: Different vehicle platforms, such as electric or hybrid models, may use proprietary signal encoding that could require additional adaptation of the CAN-D pipeline.
Development of a Universal Dataset: Building a universal and comprehensive dataset with wide coverage of different scenarios would greatly benefit further research and development in the field.

The research presented in this paper marks a significant advancement in understanding and utilizing vehicle CAN data and sets a solid foundation on which further developments and applications can be built.

Markdown