Evaluation scores and dataset bias in salient object detection benchmarking

Develop evaluation measures and benchmarking protocols for salient object detection that address and mitigate dataset bias (including center bias and annotation subjectivity) and provide reliable, comparable scoring across models and datasets. Establish metrics that better reflect segmentation quality and model behavior than current PR/ROC/AUC/F-Measure variants when datasets have inherent biases.

Background

Throughout the paper, the authors discuss limitations of existing evaluation metrics, including discrepancies between PR and ROC curves, and issues with MAE and classic F-Measure, prompting use of alternatives like the weighted Fβ measure. They also analyze center bias and scene complexity, showing how dataset characteristics can skew results and rankings.

In the abstract, the authors explicitly identify evaluation scores and dataset bias as open problems and indicate the need for improved solutions to enable fairer, more informative benchmarking of salient object detection and segmentation methods.

References

Finally, we propose probable solutions for tackling several open problems such as evaluation scores and dataset bias, which also suggest future research directions in the rapidly-growing field of salient object detection.

Salient Object Detection: A Benchmark  (1501.02741 - Borji et al., 2015) in Abstract