Domain-dependent difficulty of BFSD: coreutils vs firmware

Determine whether binary function similarity detection between two functions from coreutils is easier or harder than detection between two functions from firmware for cyberphysical systems.

Background

Existing BFSD datasets often focus on "typical" programs such as coreutils and lack representation of real-world firmware, creating potential biases in evaluation. This imbalance raises questions about how domain differences affect the difficulty of function similarity tasks.

The paper explicitly notes that it is unknown whether comparisons within standard desktop utilities (e.g., coreutils) are inherently easier or harder than comparisons within firmware for cyber-physical systems, motivating the construction of diversified benchmarks to study domain effects.

References

For example, it is unknown whether comparing two coreutils functions is easier or harder than comparing firmware functions for cyberphysical systems.

EXHIB: A Benchmark for Realistic and Diverse Evaluation of Function Similarity in the Wild  (2604.01554 - Fan et al., 2 Apr 2026) in Section 1: Introduction